The decline of frogs is a bellwether for the decline of the broader ecosystem.
Frogs are an indicator species—a frog population’s health reflects the surrounding environmental
conditions, and they are sensitive to both air and water quality [1].
Unfortunately, frog populations around the world are diminishing, with some species even disappearing.
As of 2021, the International Union for Conservation of Nature Red List of Threatened Species includes
at least one third of amphibian species [2]. In Australia alone, four species of frogs are officially
listed as extinct [3], with an additional three recently found to be likely extinct [4].
To protect frogs and probe the health of their ecosystems, scientists need to understand where frogs live
and how their numbers and distribution are changing over time. To assist with this mission, we have built
a machine learning model to predict the distribution of Litoria fallax, the eastern dwarf tree frog,
using observed location data from the Global Biodiversity Information Facility (GBIF)
in combination with temperature, precipitation, and soil moisture data from the TerraClimate dataset.
Model and data description
Our final model uses a random forest classifier algorithm,
imported from scikit-learn for Python. Our target dataset consists of
labeled occurrence data for
five species of frogs in Australia, taken from the GBIF dataset provided by EY,
alongside predictor variables based on
temperature, precipitation, and soil moisture data
from the TerraClimate dataset, hosted by the Microsoft Planetary Computer.
We restricted the frog data to the period from 1 January 2015 to the present,
and climate data to the period from January 2015 to December 2019. To reduce the computational load,
rather than using the entire datasets, we sub-sampled nine 1° by 1° squares
(equivalent to ~111 km by 111 km) around the coast of Australia. After combining the sub-sampled datasets,
we dropped 43 points that included missing data (of ~40,000 total points).
As George Box wrote, “All models are wrong, but some are useful.” In designing our model,
we made some simplifying assumptions. In selecting the date range
of our data, we assumed that the distribution of L. fallax has not changed since 2015,
and that the effect of climate change on our environmental variables over the same period has not
been significant. Additionally, we used a binary classification model;
however, with our frog occurrence dataset, no confirmed absence points are provided.
Therefore, we assumed that the presence of other frog species indicated an absence of the target species
and created pseudo-absence data accordingly.
Our key innovations in improving our model over the benchmark included
a) a class balancing method that preserves isolated non-target frog occurrences,
b) simplifying features by aggregating data by month, and
c) creating features based on a variety of descriptive statistics.
a) Class balancing
After labeling the occurrences of non-target frog species as pseudo-absences, our training dataset contained 25,733 absences and 13,839 presences over all geographical areas sampled, a nearly 2:1 imbalance. The majority class, absences, is not the class of interest; this poses a challenge for predicting the presences.
We developed a method to downsample the absences while preserving isolated ones,
which may contain information that is not represented anywhere else (Figure 1).
We searched each 1/24° by 1/24° “pixel” (the TerraClimate resolution) in our regions of interest (ROI).
Where there were multiple absences per pixel, we randomly selected one to keep, removing the rest.
Because the number of kept absence points was less than the number of presence points,
we randomly sampled from the pool of removed points, adding them back into the training set until the number of
presence and absence points were equal, resulting in a dataset containing 13,839 presence points and 13,839 absence points.
Figure 1. Downsampling off-target species occurrences while preserving isolated occurrences.
A representative region of interest containing both the target frog species (blue) and off-target frog species (orange). Each 1/24° by 1/24° “pixel” (indicated by the gridlines) that contained off-target occurrences before downsampling retains at least one off-target occurrence after downsampling.
b) Feature simplification
We tested different variables from the TerraClimate dataset, and found that using
maximum temperature, minimum temperature, precipitation,
and soil moisture as the basis of our predictor variables maximized accuracy.
To preserve the seasonal variations in these measures throughout the year, we chose to
aggregate data by month of the year when performing
feature simplification.
c) Feature creation
During our exploratory data analysis, we plotted different descriptive statistics of
the maximum monthly temperature, minimum monthly temperature, precipitation, and soil moisture
near target frogs and off-target frogs. We noticed that taking the maximum, minimum, mean, and
standard deviation for each month over the timespan of 2015-2019 resulted in large differences
for areas containing target frogs versus areas containing off-target frogs in some months
(Figure 2).
Figure 2. Example plots of monthly aggregated data for one 1/24° by 1/24° square containing only L. fallax,
the target frog, and for one 1/24° by 1/24° square containing only off-target frogs.
a) Minimum monthly maximum temperature over the timespan of 2015-2019.
We observe that this feature seems to differ for the target and non-target regions in winter and early spring.
b) Mean monthly precipitation over the timespan of 2015-2019.
We observe that this feature seems to differ for the target and non-target regions in summer.
After training our model, we evaluated the feature performance using the mean decrease in impurity (MDI)
(Figure 3).
Of the 192 total features, 40 had an MDI greater than 1/192, indicating that they are important.
The top 12 features involved maximum monthly temperature in September and October, and precipitation in January,
February, and March.
Figure 3. Feature importance by mean decrease in impurity (MDI).
Top 25 most important features (out of 192 total features). Features are labeled as
“[month] [measurement] [aggregation statistic]”; e.g., “9 tmax min” indicates
the minimum of the September maximum temperature over the years 2015–2019.
Other improvements included
d) sampling over 9 different geographical areas with varying climate and frog species occurrences, and
e) hyperparameter tuning.
d) Sub-sampling different geographies
To build our training dataset, we sub-sampled nine 1° by 1° squares
(equivalent to about 111 km by 111 km), including the five evaluation regions,
around the coast of Australia (Figure 4).
We selected regions with varying climates to expose our model to a range of data:
wet tropical, humid subtropical, subtropical highlands, and temperature climates
in both rural and urban regions are included.
Figure 4. Map of Australia showing regions of interest (ROIs).
The nine 1° by 1° squares used to subsample our training data.
We included the five evaluation ROIs (colored) and added four additional regions of interest to improve the model
(black). L. fallax is marked in orange, while other species are marked in red, blue, green, and gold.
e) Hyperparameter tuning
We used our own tuning algorithm to determine the hyperparameters for the random forest classifier.
Our final model uses 250 estimators, requires 5 samples to split an internal node, and class-balances
the subsamples in the training process. Increasing the number of estimators makes the model perform better,
at the expense of training and run time, until a point of diminishing returns.
Increasing the number of samples to split an internal node reduces the error due to bias in the model,
while increasing the error due to variance in the test data. The class weighting dictates how to treat
imbalanced classes in the training dataset.
First, we measured the effect of changing the number of estimators, number of samples to split an internal node,
and class weighting independently of one another.
Then, we revisited the effect of the last two hyperparameters with the other values fixed.
Our model achieved a prediction accuracy score of 0.77 on the unseen data.
However, it achieved an in-sample F1 score of 0.87 / in-sample accuracy of 0.86,
indicating that our model is suffering from overfitting. Using stratified 10-fold cross-validation,
we measured an out-of-sample F1 score of 0.86 / out-of-sample accuracy of 0.85.
The stratified 10-fold cross-validation used the balanced dataset, but if the unseen data is unbalanced,
the performance may decrease. Therefore, we also assessed out-of-sample performance by predicting the frogs
from a non-downsampled dataset generated from the evaluation regions.
Those predictions had an F1 score of 0.76 / accuracy of 0.80. Looking at the confusion matrices (Figure 5),
we observed that most of the test data are absences. The F1 score does not consider correctly identified negatives
in its assessment, leading to the higher out-of-sample accuracy score (which considers true negatives and true positives)
than out-of-sample F1 score.
Figure 5. In-sample and out-of-sample confusion matrices.
a) In-sample results: confusion matrix from predicting frogs on the training dataset.
b) Out-of-sample results: confusion matrix from predicting frogs on the non-downsampled dataset
generated from the evaluation regions.
To reduce overfitting, we could restrict the predictor features to the top contributors to the mean decrease in impurity.
In addition, it may be possible to improve the model by using grid search cross-validation to optimize the hyperparameters,
and by particularly focusing on the size of the random subsets of features to consider when splitting a node [5].
Our model is blazingly fast in comparison to the other models we tested, such as support vector,
k-neighbors, and gradient boost classifiers. On a consumer-grade laptop (Intel i5-8250U: 8 cores at 1.60 GHz),
our model can preprocess the predictor variable data and
predict the presence of L. fallax in a 1° by 1° square of terrain in 50 ms.
To increase the speed of predicting large areas of land, each 1° by 1° ROI can be predicted in parallel.
Alternatively, a single 1° by 1° region can be further divided to the resolution of the climate data,
and each of those subdivisions can be predicted simultaneously. Finally, scaling is not needed for random forest algorithms
and so could be eliminated—we left the preprocessing step in the pipeline from testing other algorithms,
including distance-based algorithms.
Future work to increase accuracy could involve refining the predictor variable and/or the learning algorithm.
All machine learning models risk producing useless results if trained on inappropriate data.
We could consult with an amphibian expert on what factors most affect frog health and identify an improved dataset on
which to base our predictor variables. Secondly, we could incorporate data on air and water pollution,
which are known to negatively affect frogs. Finally, we could obtain weather data of increased geographical and/or
temporal resolution. Alternatively, we could continue to use the TerraClimate dataset—we noticed that a “station_influence”
parameter is included, and propose weighting TerraClimate data based on its value, assuming data reported by more stations is
more reliable. In addition, we could use a different learning algorithm. Although the random forest classifier gave us the
best accuracy, we may obtain higher accuracy with another machine learning algorithm with improved hyperparameter tuning.
Furthermore, we could try deep learning methods, such as building a neural network using TensorFlow or PyTorch.
We developed a species distribution model (SDM) for Litoria fallax in Australia based on a
random forest classifier and using predictor variables derived from the TerraClimate dataset.
Our model’s F1 score on the unseen data increased to 0.77 from the benchmark of 0.61.
In addition, it can be trained on nearly 40,000 data points quickly and predicts unseen data quickly.
Finally, although it was evaluated as a binary classifier, it can generate probabilities of frog occurrence.
Due to our SDM’s increased accuracy, it could help with identifying candidate regions in which to search for L. fallax.
Most current occurrence data obtained from GBIF are collected by citizen scientists.
It could be helpful for volunteers or scientists to search for frogs in regions with low documented occurrences,
but high probability predicted by our model.
Because the frog population data are collected from citizen scientists, there is less historical information on frog populations
in comparison to historical weather data (the TerraClimate dataset contains weather data dating back to 1958).
Our model could be applied on historical weather data to reconstruct the
L. fallax population at different moments from the past and learn how it has changed over the last 60 years,
which could inform how the species distribution may
continue to evolve. In addition, our model could be modified to predict different species of frogs or other animals,
and in other parts of the world as well. Our model can be combined with measurements and predictions of the changing climate
to extrapolate the effect of global warming on
frog and other indicator species populations--and thus, the health of ecosystems.
[1] Daly, Natasha. “These animals offer key clues for environmental change,” National Geographic (2021-09-17).
[2] IUCN. “The IUCN Red List of Threatened Species™” (2021-12-09).
[3] Australian Government Department of Agriculture, Water and the Environment. “EPBC Act List of Threatened Fauna” (accessed 2022-06-11).
[4] Geyle, H. M., et al. “Red hot frogs: identifying the Australian frogs most at risk of extinction,” Pacific Conservation Biology (2022).
[5] scikit-learn 1.1.1. “1.11 Ensemble methods” (accessed 2022-06-11).
Litoria fallax illustrations by Lealia Xiong.
Datasets were acquired as part of the EY Data Challenge.
Weather data sourced from the TerraClimate dataset hosted by the Microsoft Planetary Computer.
Occurrence data orignally sourced from the Global Biodiversity Information Facility.
All data analysis and visualization performed in Python, with some help from functions provided in the
EY example notebook
.
frog-discovery is maintained by lealiaxiong.