Random forest variable selection in spatial malaria transmission modelling in Mpumalanga Province, South Africa

  • Thandi Kapwata | thandi.kapwata@mrc.ac.za Biostatistics Unit, South African Medical Research Council, Cape Town; School of Agriculture, Earth, and Environmental Sciences, University of KwaZulu-Natal, Durban, South Africa.
  • Michael T. Gebreslasie School of Agriculture, Earth, and Environmental Sciences, University of KwaZulu-Natal, Durban, South Africa.

Abstract

Malaria is an environmentally driven disease. In order to quantify the spatial variability of malaria transmission, it is imperative to understand the interactions between environmental variables and malaria epidemiology at a micro-geographic level using a novel statistical approach. The random forest (RF) statistical learning method, a relatively new variable-importance ranking method, measures the variable importance of potentially influential parameters through the percent increase of the mean squared error. As this value increases, so does the relative importance of the associated variable. The principal aim of this study was to create predictive malaria maps generated using the selected variables based on the RF algorithm in the Ehlanzeni District of Mpumalanga Province, South Africa. From the seven environmental variables used [temperature, lag temperature, rainfall, lag rainfall, humidity, altitude, and the normalized difference vegetation index (NDVI)], altitude was identified as the most influential predictor variable due its high selection frequency. It was selected as the top predictor for 4 out of 12 months of the year, followed by NDVI, temperature and lag rainfall, which were each selected twice. The combination of climatic variables that produced the highest prediction accuracy was altitude, NDVI, and temperature. This suggests that these three variables have high predictive capabilities in relation to malaria transmission. Furthermore, it is anticipated that the predictive maps generated from predictions made by the RF algorithm could be used to monitor the progression of malaria and assist in intervention and prevention efforts with respect to malaria.

Downloads

Download data is not yet available.

Author Biography

Thandi Kapwata, Biostatistics Unit, South African Medical Research Council, Cape Town; School of Agriculture, Earth, and Environmental Sciences, University of KwaZulu-Natal, Durban
Senior GIS Technologist at the South African Medical Research Council
Published
2016-11-16
Section
Original Articles
Keywords:
Random forest, Modelling malaria transmission, South Africa
Statistics
Abstract views: 1748

PDF: 576
APPENDIX 1: 149
APPENDIX 2: 150
HTML: 1519
Share it

PlumX Metrics

PlumX Metrics provide insights into the ways people interact with individual pieces of research output (articles, conference proceedings, book chapters, and many more) in the online environment. Examples include, when research is mentioned in the news or is tweeted about. Collectively known as PlumX Metrics, these metrics are divided into five categories to help make sense of the huge amounts of data involved and to enable analysis by comparing like with like.

How to Cite
Kapwata, T., & Gebreslasie, M. (2016). Random forest variable selection in spatial malaria transmission modelling in Mpumalanga Province, South Africa. Geospatial Health, 11(3). https://doi.org/10.4081/gh.2016.434