Real-world example
real_world_example.Rmd1. Overview
The rosario package implements a null model analysis to
quantify concurrent temporal niche overlap (i.e., activity or phenology)
among biological identities (e.g., individuals, populations, or species)
using the Rosario randomization algorithm (Castro-Arellano et al. 2010).
It is designed for cyclical time data (e.g., hours of the day, months of
the year) and can be used to assess temporal overlap among two or more
biological identities.
This vignette provides a real-world example showing how to (1) use an
external camera-trap dataset, (2) convert detections into fixed time
bins, (3) build the matrix required by rosario, and (4) run
a null model test of assemblage-wide temporal overlap.
2. Data Loading
We use a subset of the SIMDeer project (Foster et al. 2020) representing 85,929 capture events from three cervid species recorded in British Columbia, Canada (51,691 Mule deer, 19,395 Elk, and 14,843 White-tailed deer). The full dataset is available through Wildlife Insights (https://app.wildlifeinsights.org/explore).
The dataset used in this vignette is:
-
Sim_data: a CSV file containing capture events (timestamped detections) for three cervid species.
Note: To run this vignette locally with the full data set, download and load the external data using the code below. The data is hosted on Figshare. This step is not run automatically to comply with CRAN policies.
options(timeout = 600)
download.file("https://ndownloader.figshare.com/files/61985230", destfile = "Sim_data.csv", mode = "wb")3. Organizing detections into time bins
The rosario workflow requires a matrix where:
- rows = biological identities (in this example: species),
- columns = time intervals across a cycle (in this example: 30-minute bins across a 24-hour day),
- values = counts (or proportions) of detections in each time interval.
We convert event-level timestamps into a single row of binned counts per species. Time intervals are treated as circular (e.g., hours of the day), so the cycle wraps from the last interval back to the first.
3.1. Helper function to bin detections for one species
The function below converts timestamps to date-time, assigns each
detection to a 30-minute bin, and returns a one-row data frame where
columns are time bins (formatted as "HH:MM:SS") and values
are detection counts.
bin_species <- function(dat, species_code, bin_mins = 30) {
dat %>%
filter(species == species_code) %>%
mutate(
timestamp = mdy_hm(timestamp),
bin = floor_date(timestamp, "hour") +
minutes(floor(minute(timestamp) / bin_mins) * bin_mins),
hour_min_sec = format(as.POSIXct(bin), "%H:%M:%S")
) %>%
count(hour_min_sec, name = "count") %>%
tidyr::pivot_wider(
names_from = hour_min_sec,
values_from = count,
values_fill = 0
)
}3.2. Create binned rows for each cervid species
Here we generate one binned row per species:
mule_deer <- bin_species(Sim_dat, "hemionus") # Mule deer
elk <- bin_species(Sim_dat, "canadensis") # Elk
wtd <- bin_species(Sim_dat, "virginianus") # White-tailed deer3.3. Combine species rows into the rosario input
matrix
We bind the three species rows into a single matrix. Any missing time bins are filled with zeros by the binning function above, and we ensure all entries are numeric.
binned_df <- dplyr::bind_rows(
MuleDeer = mule_deer,
Elk = elk,
WTD = wtd,
.id = "species"
)
# Convert to a numeric matrix (rows = species; columns = time bins)
rownames(binned_df) <- binned_df$species
data_matrix <- binned_df %>%
select(-species) %>%
mutate(across(everything(), as.numeric)) %>%
as.matrix()
rownames(data_matrix) <- binned_df$species
dim(data_matrix)3.4. Optional: rescale counts to proportions
Users may analyze either counts or proportions (e.g., to compare
species with very different sample sizes). The function
rescale_matrix() rescales rows so each row sums to 1.
data_matrix_prop <- rescale_matrix(data_matrix)
rowSums(data_matrix_prop)4. Running rosario on the real-world dataset
This section demonstrates how to run the rosario
workflow using the matrix created above.
4.1. Create cyclic shifts and mirror images
The rosario() function creates the set of cyclic shifts
and their mirror images (reverse order), preserving shape while changing
location along the cycle and maintaining temporal autocorrelation. The
suite of vectors and mirrors represents a complete set of possible
distributions.
4.2. Visualize patterns generated by rosario()
function
Use plot_rosario() to visualize hypothetical time-use
distributions produced by rosario() for a single biological
identity. The function plots the first 10 cyclic shifts and their mirror
images. Each panel shows one hypothetical time-use distribution, with
the cyclic shift in dark gray and its mirror image in dark red.
plot_rosario(data_matrix[1, ], cols = 5) # example: visualize shifts for Mule deer4.3. Assemblage-wide temporal overlap
Use temp_overlap() to compute the mean of all pairwise
overlaps among rows (species), using the chosen index: “pianka” or
“czekanowski”. This returns the observed assemblage-wide overlap among
the cervid species.
Results_Pianka <- temp_overlap(data_matrix, method = "pianka")
Results_Pianka
Results_Czekanowski <- temp_overlap(data_matrix, method = "czekanowski")
Results_Czekanowski4.4. Null Model Test
To test whether the observed overlap differs from random
expectations, get_null_model() generates a null
distribution of assemblage-wide overlap by repeatedly randomizing the
input matrix using rosario_sample() and recomputing mean
pairwise overlap (via temp_overlap()).
get_null_model() can run simulations in parallel by
setting parallel = TRUE. For CRAN checks and maximum
portability, this vignette uses parallel = FALSE.
We recommend using nsim = 1000 for applied analyses, but
runtime depends on hardware and dataset size.
set.seed(1)
Null_Model_Pianka <- get_null_model(data_matrix, method = "pianka", nsim = 100, parallel = FALSE)
Null_Model_Pianka$p_value
Null_Model_Czekanowski <- get_null_model(data_matrix, method = "czekanowski", nsim = 100, parallel = FALSE)
Null_Model_Czekanowski$p_value4.5. Visualizing the null distribution
Use temp_overlap_plot() to compare the observed overlap
(from temp_overlap()) against the simulated null
distribution (from get_null_model()). The plot shows a
histogram of simulated overlap values and a vertical line indicating the
observed overlap.
temp_overlap_plot(Null_Model_Pianka)
temp_overlap_plot(Null_Model_Czekanowski)5. Conclusion
This vignette demonstrates the key steps needed to analyze cyclical
temporal data using rosario: downloading event-level
detections, binning them into time intervals, building a species-by-time
matrix, computing observed assemblage-wide overlap, and comparing
observed overlap against a null model.
For more details on each function, see the package documentation and the examples in each function’s help page.