New Machine Learning Method Predicts Possible Additions to Global List of Threatened Plant Species

News Release, University of Maryland

COLLEGE PARK, Md. – A new method co-developed by Anahí Espíndola, an assistant professor of entomology at the University of Maryland, uses the power of machine learning and open-access data to predict species that could be eligible for at-risk status. The research team created and trained a machine learning algorithm to assess more than 150,000 species of plants from all corners of the world, making their project among the largest assessments of conservation risk to date.

According to their results—published online in the Proceedings of the National Academy of Scienceson December 3, 2018—more than 10 percent of these species are highly likely to qualify for an at-risk classification on the International Union for Conservation of Nature’s (IUCN)Red List of Threatened Species. This list ranks threatened species in one of five categories, from of least concern to critically endangered. It is a powerful tool for researchers and policymakers working to stem the tide of species loss across the globe. But adding even a single species to the list is a large task, demanding countless hours of expensive, rigorous and highly specialized research.

Because of these limitations, a large number of known species have not yet been formally assessed by the IUCN for inclusion on the list. and ranked in one of five categories, from least concern to critically endangered. This deficit is quite apparent in plants: Only about 5 percent of all currently known plant species appear on IUCN’s Red List in any capacity.

Lead author Tara Pelletier, an assistant professor of biology at Radford University, worked with  Espíndola to perform the machine learning analysis.  The new algorithm they and collaborators created is a predictive model that can be applied to any grouping of species at any scale, from the entire globe to a single city park.  

The researchers applied their model to the many thousands of plant species that remain unlisted by IUCN. According to the results, more than 15,000 of the species—roughly 10 percent of the total assessed by the team—have a high probability of qualifying as near-threatened, at a minimum.

Espíndola and her colleagues mapped the data and noted several major geographical trends in the model’s predictions. At-risk species tended to cluster in areas already known for their high native biodiversity, such as the Central American rainforests and southwestern Australia. The model also flagged regions such as California and the southeastern United States, which are home to a large number of endemic species, meaning that these species do not naturally occur anywhere else on Earth.

“When I first started thinking about this project, I suspected that many regions with high diversity would be well-studied and protected. But we found the opposite to be true,” Espíndola said. “Many of the high-diversity areas corresponded to regions with the highest probability of risk. When we saw the maps, we were surprised it was that clear. Endemic species also tend to be more at risk because they are usually confined to smaller areas.”

The model also flagged a few surprising areas not typically known for their biodiversity, such as the southern coast of the Arabian Peninsula, as having a high number of at-risk species. Some of the most imperiled regions have not received enough attention from researchers, according to Espíndola. She hopes that her method can help to fill in some of these knowledge gaps by identifying regions and species in need of further study.

“Let’s say you wanted to assess every species of wild bee on one continent. So you do the assessment and find that only one species is at risk. Now you’ve used all those resources to identify an area with low risk, which is still helpful, but not ideal when resources are limited. We want to help prevent that from happening,” Espíndola said. “Our analysis was global, but the model can be adapted for use at any geographic scale. Everything we’ve done is 100 percent open access, highlighting the power of publicly-available data. We hope people will use our model—and we hope they point out errors and help us fix them, to make it better.”

Building a global predictive model of  at-risk species

The researchers built this predictive model using open-access data from the Global Biodiversity Information Facility (GBIF) and the TRY Plant Trait Database.

Espíndola and Pelletier trained the model using GBIF and TRY data from the relatively small group of plant species already on the IUCN Red List. This allowed the researchers to assess and fine-tune the model’s accuracy by checking its predictions against the listed species’ known IUCN risk status. The Red List sorts non-extinct species into one of five classification categories: least concern, near-threatened, vulnerable, endangered and critically endangered.

The researchers applied their model to the many thousands of plant species that remain unlisted by IUCN. According to the results, more than 15,000 of the species—roughly 10 percent of the total assessed by the team—have a high probability of qualifying as near-threatened, at a minimum.

The research paper, “Predicting plant conservation priorities on a global scale,” Tara Pelletier, Bryan Carstens, David Tank, Jack Sullivan and Anahí Espíndola, was published online in the Proceedings of the National Academy of Scienceson December 3, 2018.

This work was supported by the National Science Foundation (Award Nos. DEB-1457519, DEB-1457726 and EPS-809935), the National Institutes of Health (Award Nos. NCRR 1P20RR016454-01 and NCRR 1P20RR016448-01), DIVERSITAS/Future Earth and the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig. The content of this article does not necessarily reflect the views of these organizations.