Global Discretized Rarity Pipeline
gdrare_pipeline.Rd
Runs the full global model of discretized rarity (GDR) and its restrictions for a set of species, from model preparation to rarity axis calculation to final rarity type assignment.
Usage
gdrare_pipeline(
species_df,
species_col = "species",
abundance_df = NULL,
abundance = TRUE,
phylo = NULL,
use_internal_phylo = TRUE,
internal_phylo_name = "ALLMB",
trait_cols = NULL,
additional_dimensions = NULL,
use_most_complete_model = FALSE,
geo_rarity_method = c("taxonomic", "range"),
fun_rarity_method = c("min_distance", "mean_distance", "none"),
k_means = FALSE,
time = FALSE,
time_slices = NULL,
relative = TRUE,
min_dbscan_points = 5,
min_dbscan_distance = 1,
gbif_limit = 2000,
num_cores = 1,
site_col = "site",
abundance_col = "abundance",
slope_factor = 1,
model = NULL,
thresholds = list(GR = 0.15, GL = 0.15, FR = 0.9, FL = 0.9, PR = 0.9, PL = 0.9),
directions = list(GR = "low", GL = "low", FR = "high", FL = "high", PR = "high", PL =
"high")
)
Arguments
- species_df
A data frame containing species-level data, including species names and optionally trait, abundance, or site information.
- species_col
The name of the column in
species_df
containing species names. Default is"species"
.- abundance_df
Optional data frame containing site-by-species abundance data.
- abundance
Logical; if
TRUE
, include abundance weighting when calculating community rarity metrics. Default isTRUE
.- phylo
A phylogenetic tree of class
phylo
. IfNULL
anduse_internal_phylo = TRUE
, a default phylogeny will be used.- use_internal_phylo
Logical; if
TRUE
(default) and nophylo
is provided, the function will attempt to retrieve an internal seed plant phylogeny.- internal_phylo_name
Character; name of the internal tree to use (e.g.,
"ALLMB"
).- trait_cols
Character vector specifying the trait columns in
species_df
to be used in functional rarity calculations.- additional_dimensions
Optional list of additional custom rarity axes to include in the analysis.
- use_most_complete_model
Logical; if
TRUE
, prioritize using the rarity model that maximizes species coverage across all rarity dimensions. Default isFALSE
.- geo_rarity_method
Method to calculate geographic rarity. Options:
"taxonomic"
(based on site occupancy), or"range"
(based on convex hull/range size). Default is"taxonomic"
.- fun_rarity_method
Method for functional rarity. Options:
"min_distance"
,"mean_distance"
, or"none"
to exclude functional rarity. Default is"min_distance"
.- k_means
Logical; if
TRUE
, use k-means clustering to define rarity thresholds. Default isFALSE
.- time
Logical; if
TRUE
, run rarity analyses for each time slice (e.g., for paleo data). Default isFALSE
.- time_slices
Optional vector of time points to use if
time = TRUE
.- relative
Logical; whether to use relative distinctiveness or rarity scores (e.g., scaled within communities). Default is
TRUE
.- min_dbscan_points
Minimum number of points required for DBSCAN range calculation. Used when estimating regional geographic rarity from occurrence data. Default is 5.
- min_dbscan_distance
Minimum distance (in decimal degrees) used in DBSCAN-based convex hull calculations. Default is 1.
- gbif_limit
Maximum number of records to pull per species from GBIF when estimating range-based regional geographic rarity. Default is 2000.
- num_cores
Number of cores to use for parallel computation. Default is 1.
- site_col
Column name in
abundance_df
that identifies sites. Default is"site"
.- abundance_col
Column name in
abundance_df
that contains species abundance values. Default is"abundance"
.- slope_factor
A numeric value passed to
find_optimal_k()
to adjust the steepness of the elbow curve used for determining optimal number of clusters. Default is 1.- model
Optional character string naming a specific rarity model to use. If
NULL
, models will be inferred based on available data.- thresholds
A named list of numeric values representing the percentile cutoff for each axis. Default thresholds mark the bottom 15% for geographic rarity and the top 10% for functional and phylogenetic rarity. Custom dimensions of rarity must be thresholded by user.
- directions
A named list specifying whether rarity is associated with
"low"
or"high"
values for each dimension. If unspecified, the function defaults to"low"
for"GR"
and"GL"
, and"high"
for all other axes. Users can specify only a subset of axes to override the defaults.
Value
A data frame with species-level rarity classifications. This includes raw values and rarity flags for each axis and rarity classifications for each restriction.
Details
This function is a high-level wrapper that runs the entire GDR workflow. It:
Prepares rarity restrictions and input data.
Calculates rarity dimensions.
Assigns rarity types.
Author
Alivia G. Nytko, anytko@vols.utk.edu
Examples
if (FALSE) { # \dontrun{
species <- data.frame(species = c("Abies_procera", "Alnus_incana"), trait1 = c(1.2, 3.4), trait2 = c(1.7, 9.8))
abundance <- data.frame(species = c("Abies_procera", "Alnus_incana", "Abies_procera", "Alnus_incana", "Alnus_incana"), site = c("A", "A", "B", "C", "D"), presence_absence = c(1, 1, 1, 1, 1), abundance = c(10, 5, 15, 3, 7))
classified <- gdrare_pipeline(species_df = species, abundance_df = abundance, trait_cols = c("trait1", "trait2"), geo_rarity_method = "taxonomic", fun_rarity_method = "mean_distance")
print(classified)
} # }