A review of the heterogeneous landscape of biodiversity databases: Opportunities and challenges for a synthesized biodiversity knowledge base

Aim: Addressing global environmental challenges requires access to biodiversity data across wide spatial, temporal and taxonomic scales. Availability of such data has increased exponentially recently with the proliferation of biodiversity databases. However, heterogeneous coverage, protocols, and standards have hampered integration among these databases. To stimulate the next stage of data integration, here we present a synthesis of major databases, and investigate (a) how the coverage of databases varies across taxonomy, space, and record type; (b) what degree of integration is present among databases; (c) how integration of databases can increase biodiversity knowledge; and (d) the barriers to database integration. Location: Global. Time period: Contemporary. Major taxa studied: Plants and vertebrates. Methods: We reviewed 12 established biodiversity databases that mainly focus on geographic distributions and functional traits at global scale. We synthesized information from these databases to assess the status of their integration and major knowledge gaps and barriers to full integration. We estimated how improved integration can increase the data coverage for terrestrial plants and vertebrates. Results: Every database reviewed had a unique focus of data coverage. Exchanges of biodiversity information were common among databases, although not always clearly documented. Functional trait databases were more isolated than those pertaining to species distributions. Variation and potential incompatibility of taxonomic systems used by different databases posed a major barrier to data integration. We found that integration of distribution databases could lead to increased taxonomic coverage that corresponds to 23 years’ advancement in data accumulation, and improvement in taxonomic coverage could be as high as 22.4% for trait databases. Main conclusions: Rapid increases in biodiversity knowledge can be achieved through the integration of databases, providing the data necessary to address critical environmental challenges. Full integration across databases will require tackling the major impediments to data integration: taxonomic incompatibility, lags in data exchange, barriers to effective data synchronization, and isolation of individual initiatives.