The effects of model and data complexity on predictions from species distributions models

How complex does a model need to be to provide useful predictions is a matter of continuous debate across environmental sciences. In the species distributions modelling literature, studies have demonstrated that more complex models tend to provide better fits. However, studies have also shown that predictive performance does not always increase with complexity. Testing of species distributions models is challenging because independent data for testing are often lacking, but a more general problem is that model complexity has never been formally described in such studies. Here, we systematically examine predictive performance of models against data and models of varying complexity. We introduce the concept of computational complexity, widely used in theoretical computer sciences, to quantify model complexity. In addition, complexity of species distributional data is characterized by their geometrical properties. Tests involved analysis of models’ ability to predict virtual species distributions in the same region and the same time as used for training the models, and to project distributions in different times under climate change. Of the eight species distribution models analyzed five (Random Forest, boosted regression trees, generalized additive models, multivariate adaptive regression splines, MaxEnt) showed similar performance despite differences in computational complexity. The ability of models to forecast distributions under climate change was also not affected by model complexity. In contrast, geometrical characteristics of the data were related to model performance in several ways: complex datasets were consistently more difficult to model, and the complexity of the data was affected by the choice of predictors and the type of data analyzed. Given our definition of complexity, our study contradicts the widely held view that the complexity of species distributions models has significant effects in their predictive ability while findings support for previous observations that the properties of species distributions data and their relationship with the environment are strong predictors of model success.