Skip to contents

Runs the full global model of discretized rarity (GDR) and its restrictions for a set of species, from model preparation to rarity axis calculation to final rarity type assignment.

Usage

gdrare_pipeline(
  species_df,
  species_col = "species",
  abundance_df = NULL,
  abundance = TRUE,
  phylo = NULL,
  use_internal_phylo = TRUE,
  internal_phylo_name = "ALLMB",
  trait_cols = NULL,
  additional_dimensions = NULL,
  use_most_complete_model = FALSE,
  geo_rarity_method = c("taxonomic", "range"),
  fun_rarity_method = c("min_distance", "mean_distance", "none"),
  k_means = FALSE,
  time = FALSE,
  time_slices = NULL,
  relative = TRUE,
  min_dbscan_points = 5,
  min_dbscan_distance = 1,
  gbif_limit = 2000,
  num_cores = 1,
  site_col = "site",
  abundance_col = "abundance",
  slope_factor = 1,
  model = NULL,
  thresholds = list(GR = 0.15, GL = 0.15, FR = 0.9, FL = 0.9, PR = 0.9, PL = 0.9),
  directions = list(GR = "low", GL = "low", FR = "high", FL = "high", PR = "high", PL =
    "high")
)

Arguments

species_df

A data frame containing species-level data, including species names and optionally trait, abundance, or site information.

species_col

The name of the column in species_df containing species names. Default is "species".

abundance_df

Optional data frame containing site-by-species abundance data.

abundance

Logical; if TRUE, include abundance weighting when calculating community rarity metrics. Default is TRUE.

phylo

A phylogenetic tree of class phylo. If NULL and use_internal_phylo = TRUE, a default phylogeny will be used.

use_internal_phylo

Logical; if TRUE (default) and no phylo is provided, the function will attempt to retrieve an internal seed plant phylogeny.

internal_phylo_name

Character; name of the internal tree to use (e.g., "ALLMB").

trait_cols

Character vector specifying the trait columns in species_df to be used in functional rarity calculations.

additional_dimensions

Optional list of additional custom rarity axes to include in the analysis.

use_most_complete_model

Logical; if TRUE, prioritize using the rarity model that maximizes species coverage across all rarity dimensions. Default is FALSE.

geo_rarity_method

Method to calculate geographic rarity. Options: "taxonomic" (based on site occupancy), or "range" (based on convex hull/range size). Default is "taxonomic".

fun_rarity_method

Method for functional rarity. Options: "min_distance", "mean_distance", or "none" to exclude functional rarity. Default is "min_distance".

k_means

Logical; if TRUE, use k-means clustering to define rarity thresholds. Default is FALSE.

time

Logical; if TRUE, run rarity analyses for each time slice (e.g., for paleo data). Default is FALSE.

time_slices

Optional vector of time points to use if time = TRUE.

relative

Logical; whether to use relative distinctiveness or rarity scores (e.g., scaled within communities). Default is TRUE.

min_dbscan_points

Minimum number of points required for DBSCAN range calculation. Used when estimating regional geographic rarity from occurrence data. Default is 5.

min_dbscan_distance

Minimum distance (in decimal degrees) used in DBSCAN-based convex hull calculations. Default is 1.

gbif_limit

Maximum number of records to pull per species from GBIF when estimating range-based regional geographic rarity. Default is 2000.

num_cores

Number of cores to use for parallel computation. Default is 1.

site_col

Column name in abundance_df that identifies sites. Default is "site".

abundance_col

Column name in abundance_df that contains species abundance values. Default is "abundance".

slope_factor

A numeric value passed to find_optimal_k() to adjust the steepness of the elbow curve used for determining optimal number of clusters. Default is 1.

model

Optional character string naming a specific rarity model to use. If NULL, models will be inferred based on available data.

thresholds

A named list of numeric values representing the percentile cutoff for each axis. Default thresholds mark the bottom 15% for geographic rarity and the top 10% for functional and phylogenetic rarity. Custom dimensions of rarity must be thresholded by user.

directions

A named list specifying whether rarity is associated with "low" or "high" values for each dimension. If unspecified, the function defaults to "low" for "GR" and "GL", and "high" for all other axes. Users can specify only a subset of axes to override the defaults.

Value

A data frame with species-level rarity classifications. This includes raw values and rarity flags for each axis and rarity classifications for each restriction.

Details

This function is a high-level wrapper that runs the entire GDR workflow. It:

  1. Prepares rarity restrictions and input data.

  2. Calculates rarity dimensions.

  3. Assigns rarity types.

Author

Alivia G. Nytko, anytko@vols.utk.edu

Examples

if (FALSE) { # \dontrun{
species <- data.frame(species = c("Abies_procera", "Alnus_incana"), trait1 = c(1.2, 3.4), trait2 = c(1.7, 9.8))
abundance <- data.frame(species = c("Abies_procera", "Alnus_incana", "Abies_procera", "Alnus_incana", "Alnus_incana"), site = c("A", "A", "B", "C", "D"), presence_absence = c(1, 1, 1, 1, 1), abundance = c(10, 5, 15, 3, 7))

classified <- gdrare_pipeline(species_df = species, abundance_df = abundance, trait_cols = c("trait1", "trait2"), geo_rarity_method = "taxonomic", fun_rarity_method = "mean_distance")

print(classified)
} # }