Data projects: dengue competition

Dengue fever is a mosquito-borne disease that occurs in tropical and sub-tropical parts of the world. In mild cases, symptoms are similar to the flu: fever, rash, and muscle and joint pain. In severe cases, dengue fever can cause severe bleeding, low blood pressure, and even death.

Because it is carried by mosquitoes, the transmission dynamics of dengue are related to climate variables such as temperature and precipitation. Although the relationship to climate is complex, a growing number of scientists argue that climate change is likely to produce distributional shifts that will have significant public health implications worldwide.

Using environmental data collected by various U.S. Federal Government agencies—from the Centers for Disease Control and Prevention to the National Oceanic and Atmospheric Administration in the U.S. Department of Commerce—can you predict the number of dengue fever cases reported each week in San Juan, Puerto Rico and Iquitos, Peru?


  • Current rank: 875 / 7338   -MAE: 25.32-
  • Programming language: R
  • Used packages: rstudioapi, dplyr, lubridate, ggplot2, randomForest, caret, rpart, rpart.plot, gridExtra, plotly, zoo, gdata, tidyr, reshape2, weathermetrics
  • Main problems: –
  • Next steps: 
    • Use powerful r packages, like MICE or Amelia, for imputing missing values
    • Time series analysis
    • Deeper analysis of the outliers
    • A better way of splitting into training/testing