Statistical Analysis

Hypothesis

Has the US seen more natural disasters over time? Our team hypothesizes that the continental United States is seeing an increase in count of natural disasters over the years as average temperatures rise. We will consider including average temperature, total precipitation, and year in our model.

Distribution of Outcome: Count of Disasters

Modeling: Poisson vs Negative Binomial Regression

After inspecting the summary visualizations and plotting the distribution of the data, we decided to run a Poisson regression model to formally test the hypothesis that the count of natural disasters by US region is increasing between 1953-2018. The distribution illustrating the number of disasters per year indicated that our data was highly right skewed. Poisson modeling is appropriate because (1) we are modeling count data and (2) our data is right skewed with positive values only. Motivated by our exploratory visualizations, we also decided it was necessary to control for mean temperature and precipitation.

There are two main ways to model this data:

Poisson Regression: assumes mean = variance
Negative Binomial Regression: accounts for overdispersion where variance > mean

Variables of Interest

First, let’s consider our variables of interest.

Count of Disasters: The number of disasters by year and region.
Year: Ranges from 1953 to 2018.
Region: The US regions categorized by Midwest (reference category), Northeast, Southwest, and West. These regions were defined by the World Atlas definitions.
Average Temperature: The average temperature in degrees Fahrenheit by region.
Precipitation: The total amount of rainfall in inches by region.

Cross Validation

After testing various models with just Poisson, we fit a final model that takes the form:

\[ log(\lambda Count \ of \ Disasters) = \beta_0 + \beta_1 Year + \beta_2 Region + \beta_3 Average \ temp + \beta_4 Precipitation \]

We tested several interaction terms, but none were significant enough to include in the final model. Now we can use cross validation to test whether Poisson or Negative Binomial Regression fits our data best.

Reviewing the distributions of residual mean squared errors above, it is hard to distinguish between our two models. Instead, we can compare AICs and use the model with the lowest AIC.

Poisson Model

AIC
25241.77

Negative Binomial Model

AIC
2880.341

Using AIC as a measure for Goodness of Fit, we choose the Poisson Model.

poisson_output = poisson_model %>%
  broom::tidy() %>%
  dplyr::select(term, estimate, p.value) %>%
  mutate(exp(estimate)) 

poisson_output %>% 
  knitr::kable(digits = 3)

term	estimate	p.value	exp(estimate)
(Intercept)	-61.792	0.000	0.000
year	0.033	0.000	1.034
regionnortheast	-1.392	0.000	0.249
regionsoutheast	-0.621	0.000	0.538
regionwest	-0.776	0.000	0.460
ave_temp	-0.008	0.014	0.992
sum_precip	0.002	0.000	1.002

Main Findings

Fitting the Poisson regression model, the results were significant at the 5% level of significance. After adjusting for region in the US, average temperature, and precipitation, we found that for every increase in year, the expected count of natural disasters increases by 1.034 times. This supported our hypothesis that the count of natural disasters increased from 1953 to 2018.

As for geographic regions, we found that the Northeast, Southeast, and West regions have significantly less expected counts of natural disasters compared to the Midwest controlling for other variables in the model. For example, the Northeast has 0.248 times, or about 75% less, expected disasters compared to the Midwest.

The results for our climate indicators were unexpected. The average temperature is negatively associated while the total precipitation is positively associated with count of natural disasters. As temperature increases by 1-degree F, the expected count of natural disasters decreases by about 1% or 0.992 times controlling for other variables in the model. As total precipitation increases by 1-inch, the expected count of natural disasters increases by 1.002 times.

Reflecting on this, we can conclude that further analysis should be done on different types of disasters. For example, temperature may be positively associated with counts of fires or droughts. But these disasters may be negatively associated with precipitation.

Limitations

There are several large limitations to this analysis:

Year: We assumed a linear relationship with the continuous variable Year (1953-2018) and log of expected count of natural disasters. This is most likely not correct and further research must consider how to best use a time related variable.
Type of Natural Disaster: We grouped all natural disasters together. It is important for future researchers to consider how changing temperatures and precipitation could be associated differently with different types of disasters (drought vs blizzard).
FEMA Reporting: We used data from FEMA reported disaster declarations which included county level reporting. For this reason, one large blizzard could be reported by 10 counties and over counted in our data.
Missing Data: NOAA did not have climate indicator variables for the states of Hawaii and Alaska.
Other Variables: Our team considered only a subset of possible climate indicator variables. Future researchers should consider how else to represent climate change.