Impacts of sample ratio and size on the performance of random forest model to predict the potential distribution of snail habitats

Submitted: 13 September 2022
Accepted: 8 February 2023
Published: 3 July 2023
Abstract Views: 1447
PDF: 463
Supplementary Materials: 36
HTML: 13
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Authors

Few studies have considered the impacts of sample size and sample ratio of presence and absence points on the results of random forest (RF) testing. We applied this technique for the prediction of the spatial distribution of snail habitats based on a total of 15,000 sample points (5,000 presence samples and 10,000 control points). RF models were built using seven different sample ratios (1:1, 1:2, 1:3, 1:4, 2:1, 3:1, and 4:1) and the optimal ratio was identified via the Area Under the Curve (AUC) statistic. The impact of sample size was compared by RF models under the optimal ratio and the optimal sample size. When the sample size was small, the sampling ratios of 1:1, 1:2 and 1:3 were significantly better than the sample ratios of 4:1 and 3:1 at all four levels of sample sizes (p<0.01) and there was no significant difference among the ratios of 1:1, 1:2 and 1:3 (p>0.05). The sample ratio of 1:2 appeared to be optimal for a relatively large sample size with the lowest quartile deviation. In addition, increasing the sample size produced a higher AUC and a smaller slope and the most suitable sample size found in this study was 2400 (AUC=0.96). This study provides a feasible idea to select an appropriate sample size and sample ratio for ecological niche modelling (ENM) and also provides a scientific basis for the selection of samples to accurately identify and predict snail habitat distributions.

Dimensions

Altmetric

PlumX Metrics

Downloads

Download data is not yet available.

Citations

Bean WT, Stafford R, Brashares JS, 2012. The effects of small sample size and sample bias on threshold selection and accuracy assessment of species distribution models. Ecography 35:250-8. DOI: https://doi.org/10.1111/j.1600-0587.2011.06545.x
Cao ZG, Li S, Zhao YE, Wang TP, Bergquist R, Huang YY, Gao FH, Hu Y, Zhang ZJ, 2018. Spatio-temporal pattern of schistosomiasis in Anhui Province, East China: Potential effect of the Yangtze River - Huaihe River Water Transfer Project. Parasitol Int 67:538-46. DOI: https://doi.org/10.1016/j.parint.2018.05.007
Chalghaf B, Chemkhi J, Mayala B, Harrabi M, Benie GB, Michael E, Ben Salah A, 2018. Ecological niche modeling predicting the potential distribution of Leishmania vectors in the Mediterranean basin: impact of climate change. Parasites Vectors 11:461. DOI: https://doi.org/10.1186/s13071-018-3019-x
Colley DG, Bustinduy AL, Secor WE, King CH, 2014. Human schistosomiasis. Lancet 383:2253-64. DOI: https://doi.org/10.1016/S0140-6736(13)61949-2
Escobar LE, Craft ME, 2016. Advances and limitations of disease biogeography using ecological niche modeling. Front Microbiol 7:1174. eCollection 2016. DOI: https://doi.org/10.3389/fmicb.2016.01174
Guo JG, Vounatsou P, Cao CL, Utzinger J, Zhu HQ, Anderegg D, Zhu R, He ZY, Li D, Hu F, Chen MG, Tanner M, 2005. A geographic information and remote sensing based model for prediction of Oncomelania hupensis habitats in the Poyang Lake area, China. Acta Trop 96:213-22. DOI: https://doi.org/10.1016/j.actatropica.2005.07.029
Global Land Information System (GLIS). Available from: https://pubs.er.usgs.gov/publication/78
Hernandez PA, Graham CH, Master LL, Albert DL, 2006. The effect of sample size and species characteristics on performance of different species distribution modeling methods. Ecography 29:773-85. DOI: https://doi.org/10.1111/j.0906-7590.2006.04700.x
Liu CR, White M, Newell G, 2011. Measuring and comparing the accuracy of species distribution models with presence-absence data. Ecography 34:232-43. DOI: https://doi.org/10.1111/j.1600-0587.2010.06354.x
Pedersen UB, Stendel M, Midzi N, Mduluza T, Soko W, Stensgaard AS, Vennervald BJ, Mukaratirwa S, Kristensen TK, 2014. Modelling climate change impact on the spatial distribution of fresh water snails hosting trematodes in Zimbabwe. Parasites Vectors 7:536. DOI: https://doi.org/10.1186/s13071-014-0536-0
Peterson AT, Papes M, Eaton M, 2007. Transferability and model evaluation in ecological niche modeling: a comparison of GARP and Maxent. Ecography 30:550-60. DOI: https://doi.org/10.1111/j.0906-7590.2007.05102.x
Sage KM, Johnson TL, Teglas MB, Nieto NC, Schwan TG, 2017. Ecological niche modeling and distribution of Ornithodoros hermsi associated with tick-borne relapsing fever in western North America. PLoS Negl Trop Dis 11:e0006047. DOI: https://doi.org/10.1371/journal.pntd.0006047
Scholte RG, Carvalho OS, Malone JB, Utzinger J, Vounatsou P, 2012. Spatial distribution of Biomphalaria spp., the intermediate host snails of Schistosoma mansoni, in Brazil. Geospat Health 6:S95-101. DOI: https://doi.org/10.4081/gh.2012.127
Stockwell DRB, Peterson AT, 2002. Effects of sample size on accuracy of species distribution models. Ecol Modell 148:1-13. DOI: https://doi.org/10.1016/S0304-3800(01)00388-X
Thuiller W, Lafourcade B, Engler R, Araujo MB, 2009. BIOMOD - a platform for ensemble forecasting of species distributions. Ecography 32:369-73. DOI: https://doi.org/10.1111/j.1600-0587.2008.05742.x
Warren DL, Seifert SN, 2011. Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. Ecol Appl 21:335-42. DOI: https://doi.org/10.1890/10-1171.1
WorldClim. Available from: https://www.worldclim.org/
Xia C, Hu Y, Ward MP, Lynn H, Li S, Zhang J, Hu J, Xiao S, Lu C, Li S, Liu Y, Zhang Z, 2019. Identification of high-risk habitats of Oncomelania hupensis, the intermediate host of schistosoma japonium in the Poyang Lake region, China: A spatial and ecological analysis. PLoS Negl Trop Dis 13:e0007386. DOI: https://doi.org/10.1371/journal.pntd.0007386
Zhang J, Yue M, Hu Y, Bergquist R, Su C, Gao F, Cao ZG, Zhang Z, 2020. Risk prediction of two types of potential snail habitats in Anhui Province of China: Model-based approaches. PLoS Negl Trop Dis 14:e0008178. DOI: https://doi.org/10.1371/journal.pntd.0008178
Zhang Z, Jiang Q, 2011. Schistosomiasis elimination. Lancet Infect Dis 11:345-47. DOI: https://doi.org/10.1016/S1473-3099(11)70109-8
Zhu G, Fan J, Peterson AT, 2017. Schistosoma japonicum transmission risk maps at present and under climate change in mainland China. PLoS Negl Trop Dis 11:e0006021. DOI: https://doi.org/10.1371/journal.pntd.0006021
Zhu HR, Liu L, Zhou XN, Yang GJ, 2015. Ecological Model to Predict Potential Habitats of Oncomelania hupensis, the Intermediate Host of Schistosoma japonicum in the Mountainous Regions, China. PLoS Negl Trop Dis 9:e0004028. DOI: https://doi.org/10.1371/journal.pntd.0004028
Zou L, Ruan S, 2015. Schistosomiasis transmission and control in China. Acta Trop 143:51-7. DOI: https://doi.org/10.1016/j.actatropica.2014.12.004

How to Cite

Liu, Y., Zhang, J., P. Ward, M., Tu, W., Yu, L., Shi, J., Hu, Y., Gao, F., Cao, Z., & Zhang, Z. (2023). Impacts of sample ratio and size on the performance of random forest model to predict the potential distribution of snail habitats. Geospatial Health, 18(2). https://doi.org/10.4081/gh.2023.1151