Global Discretized Rarity Pipeline

Runs the full global model of discretized rarity (GDR) and its restrictions for a set of species, from model preparation to rarity axis calculation to final rarity type assignment.

Usage

gdrare_pipeline(
  species_df,
  species_col = "species",
  abundance_df = NULL,
  abundance = TRUE,
  phylo = NULL,
  use_internal_phylo = TRUE,
  internal_phylo_name = "ALLMB",
  trait_cols = NULL,
  additional_dimensions = NULL,
  use_most_complete_model = FALSE,
  geo_rarity_method = c("taxonomic", "range"),
  fun_rarity_method = c("min_distance", "mean_distance", "none"),
  k_means = FALSE,
  time = FALSE,
  time_slices = NULL,
  relative = TRUE,
  min_dbscan_points = 5,
  min_dbscan_distance = 1,
  gbif_limit = 2000,
  num_cores = 1,
  site_col = "site",
  abundance_col = "abundance",
  slope_factor = 1,
  model = NULL,
  thresholds = list(GR = 0.15, GL = 0.15, FR = 0.9, FL = 0.9, PR = 0.9, PL = 0.9),
  directions = list(GR = "low", GL = "low", FR = "high", FL = "high", PR = "high", PL =
    "high")
)

Arguments

species_df: A data frame containing species-level data, including species names and optionally trait, abundance, or site information.
species_col: The name of the column in species_df containing species names. Default is "species".
abundance_df: Optional data frame containing site-by-species abundance data.
abundance: Logical; if TRUE, include abundance weighting when calculating community rarity metrics. Default is TRUE.
phylo: A phylogenetic tree of class phylo. If NULL and use_internal_phylo = TRUE, a default phylogeny will be used.
use_internal_phylo: Logical; if TRUE (default) and no phylo is provided, the function will attempt to retrieve an internal seed plant phylogeny.
internal_phylo_name: Character; name of the internal tree to use (e.g., "ALLMB").
trait_cols: Character vector specifying the trait columns in species_df to be used in functional rarity calculations.
additional_dimensions: Optional list of additional custom rarity axes to include in the analysis.
use_most_complete_model: Logical; if TRUE, prioritize using the rarity model that maximizes species coverage across all rarity dimensions. Default is FALSE.
geo_rarity_method: Method to calculate geographic rarity. Options: "taxonomic" (based on site occupancy), or "range" (based on convex hull/range size). Default is "taxonomic".
fun_rarity_method: Method for functional rarity. Options: "min_distance", "mean_distance", or "none" to exclude functional rarity. Default is "min_distance".
k_means: Logical; if TRUE, use k-means clustering to define rarity thresholds. Default is FALSE.
time: Logical; if TRUE, run rarity analyses for each time slice (e.g., for paleo data). Default is FALSE.
time_slices: Optional vector of time points to use if time = TRUE.
relative: Logical; whether to use relative distinctiveness or rarity scores (e.g., scaled within communities). Default is TRUE.
min_dbscan_points: Minimum number of points required for DBSCAN range calculation. Used when estimating regional geographic rarity from occurrence data. Default is 5.
min_dbscan_distance: Minimum distance (in decimal degrees) used in DBSCAN-based convex hull calculations. Default is 1.
gbif_limit: Maximum number of records to pull per species from GBIF when estimating range-based regional geographic rarity. Default is 2000.
num_cores: Number of cores to use for parallel computation. Default is 1.
site_col: Column name in abundance_df that identifies sites. Default is "site".
abundance_col: Column name in abundance_df that contains species abundance values. Default is "abundance".
slope_factor: A numeric value passed to find_optimal_k() to adjust the steepness of the elbow curve used for determining optimal number of clusters. Default is 1.
model: Optional character string naming a specific rarity model to use. If NULL, models will be inferred based on available data.
thresholds: A named list of numeric values representing the percentile cutoff for each axis. Default thresholds mark the bottom 15% for geographic rarity and the top 10% for functional and phylogenetic rarity. Custom dimensions of rarity must be thresholded by user.
directions: A named list specifying whether rarity is associated with "low" or "high" values for each dimension. If unspecified, the function defaults to "low" for "GR" and "GL", and "high" for all other axes. Users can specify only a subset of axes to override the defaults.

Value

A data frame with species-level rarity classifications. This includes raw values and rarity flags for each axis and rarity classifications for each restriction.

Details

This function is a high-level wrapper that runs the entire GDR workflow. It:

Prepares rarity restrictions and input data.
Calculates rarity dimensions.
Assigns rarity types.

Author

Alivia G. Nytko, anytko@vols.utk.edu

Examples

if (FALSE) { # \dontrun{
species <- data.frame(species = c("Abies_procera", "Alnus_incana"), trait1 = c(1.2, 3.4), trait2 = c(1.7, 9.8))
abundance <- data.frame(species = c("Abies_procera", "Alnus_incana", "Abies_procera", "Alnus_incana", "Alnus_incana"), site = c("A", "A", "B", "C", "D"), presence_absence = c(1, 1, 1, 1, 1), abundance = c(10, 5, 15, 3, 7))

classified <- gdrare_pipeline(species_df = species, abundance_df = abundance, trait_cols = c("trait1", "trait2"), geo_rarity_method = "taxonomic", fun_rarity_method = "mean_distance")

print(classified)
} # }