Community-level vs species-specific approaches to model selection

A topic of particular current interest is community-level approaches to species distribution modelling (SDM), i.e. approaches that simultaneously analyse distributional data for multiple species. Previous studies have looked at the advantages of community-level approaches for parameter estimation, but not for model selection - the process of choosing which model (and in particular, which subset of environmental variables) to fit to data. We compared the predictive performance of models using the same modelling method (generalised linear models) but choosing the subset of variables to include in the model either simultaneously across all species (community-level model selection) or separately for each species (species-specific model selection). Our results across two large presence/absence tree community datasets were inconclusive as to whether there was an overall difference in predictive performance between models fitted via species-specific vs community-level model selection. However, we found some evidence that a community approach was best suited to modelling rare species, and its performance decayed with increasing prevalence. That is, when data were sparse there was more opportunity for gains from borrowing strength across species via a community-level approach. Interestingly, we also found that the community-level approach tended to work better when the model selection problem was more difficult, and more reliably detected noise variables that should be excluded from the model.