INTRODUCTION

What is the root cause of crime? This is a complex question with many potential answers, such as poverty or gun ownership. In this project, I aim to shed some light on this question by exploring the relationship between environmental and demographic factors and crime, particularly in the state of Illinois, with a focus on violent and property crimes. To do this, I will utilize data at the county level for the year 2019 to obtain or create the following variables: crime per capita, median income, FOID (Firearm Owner Identification) card ownership per capita, poverty rate, and population density. The primary objective is to explore how crime per capita correlates with the other four variables and, therefore, gain an idea of the most significant causes of crime.

CONCEPTUAL MODEL

An environment is not simply a physical space categorized by natural objects like trees and rivers; instead, it encompasses any conditions of the area in which we exist. In this project, I focus on the social environment of Illinois, of which crime is a part. Although crime is the main environmental aspect being modeled, I am also modeling FOID card ownership, which I consider to be a part of the social environment. Additionally, I am examining two economic variables: median income and the poverty rate, as well as the only spatial and physical variable of population density. This variable is spatial because it measures the population density of the physical environment in each county.

I chose these four variables to assess against crime because, in my opinion, they are some of the most relevant when talking about the causes of crime in today’s society. One of the most commonly mentioned factors in this realm is economic hardship, with the idea being that poverty-stricken people are more likely to fall into crime. I am selecting the variables poverty rate and median income to act as indicators of economic hardship and expect to find that the poverty rate has a positive correlation with crime per capita, while median income has a negative correlation with crime per capita.

Guns are another component at the forefront of crime discussions. Two contrasting ideas on this subject are that more guns in circulation will lead to more crime, and also the opposite, that more guns will lead to less crime. A significant issue with using this data is that neither the US nor Illinois government tracks owned firearms. Essentially, we can only estimate the exact number of guns owned. Fortunately, the Illinois government does track FOID cards, which are required to own a firearm in Illinois, so I will use this data, specifically FOID cards per capita, as a proxy for firearm ownership. This will be interesting to compare with crime per capita because, under the assumption that legal gun ownership deters crime, I should expect to find a negative correlation between crime per capita and FOID cards per capita. However, such a correlation could also be because an area with low crime is likely to have fewer felons, and therefore more people are able to legally own a gun. On the other hand, if a positive correlation is found, this could mean two things: more legally owned guns lead to more crime, or that people in higher crime areas are more likely to obtain FOID cards and guns for self-defense because of this higher amount of crime.

Finally, consider the idea that most major cities are hotspots for crime. A characteristic of such cities is higher population density, which is why I think population density could be a general attribute of high-crime areas and potentially a cause of crime. Perhaps because the denser a population is, the more human social interactions occur, which means more opportunities for violent and property crimes.

DATA WRANGLING

I chose to obtain data for the year 2019 because this was before the COVID-19 pandemic, something that could have effects on the data. The crime data I obtained came from the Illinois State Police. Originally, I searched for the crime data in the FBI database, but the 2019 data was not complete, something I realized after noticing unusually small crime numbers. Still, the Illinois State Police crime data was not perfect as a few counties had unreported crimes, but I decided it is okay to omit these in my analysis. This data set contained both reported crimes and arrests, but I felt it appropriate to use the reported crime numbers, as not all crimes committed necessarily lead to an arrest.

Next, I obtained the income data, poverty data, and population data from the US Census Bureau. These were simple to obtain with some searching and filtering on the US Census Bureau website, again all for the year 2019. Then, the FOID data needed to be obtained, and this data came with one issue. Despite my efforts, I could not find any datasets containing the number of FOID cards per Illinois County anywhere except for the Illinois State Police website. The problem being that nowhere on this website could I find the data for any specific years, simply a blank dataset containing the FOID card numbers without any labels saying when the data is from. So, I could only assume this data is current or somewhat recent, which means I was forced to use 2023 FOID card data in comparison with the other data from 2019. Although this inconsistency is not ideal, for these purposes, it should be okay, assuming that FOID card numbers in 2019 vs. 2023 have changed similarly for each county.

Lastly, I needed to obtain shapefiles for Illinois with county boundaries to map data and obtain the area for each county, needed for creating the population density variable. These shapefiles were obtained from the Illinois State Geological Survey.

All the data sets were in CSV form, so it was simple to create four variables for FOID cards, Income, Poverty rates, and Population by reading in the CSV’s. Then I also read in the Illinois shapefile.

library(sp)
## Warning: package 'sp' was built under R version 4.3.2
library(tmap) 
## Warning: package 'tmap' was built under R version 4.3.2
## Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
## remotes::install_github('r-tmap/tmap')
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(sf)
## Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
library(tmaptools)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.3     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Read in Data for Foid cards, Median income, Poverty Rate, and Population
foids <- read.csv("CardsPerCounty_data.csv")
income <- read.csv("IllinoisCountyIncome2019.csv")
poverty <- read.csv("PovertyData2019.csv")
pop <- read.csv("Population2019.csv")

# Read in Illinois Shapefile
ill <- read_sf("IL_BNDY_County_Py.shp")

There were some inconsistencies with the county names between the different datasets that I discovered. Firstly, the Illinois shapefile had county names in all uppercase, while the others did not. So, I had to fix this difference by making the county name in all files uppercase. After some further wrangling and mapping the data, I realized that one county was missing from my dataset for seemingly no reason. I determined that it was happening when merging several datasets later on, which led me to believe that there must be some inconsistency between these datasets. Using the mapped data, I determined that the missing county was De Witt. The issue was that in some datasets, it was labeled as “DEWITT,” and in others as “DE WITT.” So, I replaced the county name in the appropriate datasets with “DEWITT,” which fixed this issue.

# Make all County Names Upper Case for consistency 
foids$County <- toupper(foids$County)
income$County <- toupper(income$County)
poverty$County <- toupper(poverty$County)
pop$County <- toupper(pop$County)

# Fix issue with De Witt county name inconsistency 
income$County <- gsub("DE WITT", "DEWITT", income$County)
pop$County <- gsub("DE WITT", "DEWITT", pop$County)
poverty$County <- gsub("DE WITT", "DEWITT", poverty$County)

The next step I took was to create an overall dataset, which I did by merging the previous five datasets into one. Naturally, I merged them by county name. Then, in order to calculate some necessary variables, I removed commas from the population variable and made it numeric. After also converting the FOID cards variable to numeric, I calculated the cards per capita by dividing this number by the population. Then, I also calculated the population density of each county by dividing the population by the area for each county, with the area being obtained using the ‘st_area’ function on the geometry for each county.

# Merge all data sets into one (ill, foids, income, poverty, population)
illData <- merge(ill, foids, by.x = "COUNTY_NAM", by.y = "County")
illData <- merge(illData, income, by.x = "COUNTY_NAM", by.y = "County")
illData <- merge(illData, poverty, by.x = "COUNTY_NAM", by.y = "County")
illData <- merge(illData, pop, by.x = "COUNTY_NAM", by.y = "County")

# Make population variable numeric and Remove commas from Population variable in order to use as.numeric
illData$Population <- as.numeric(gsub(",","", illData$Population))

# Make FOID cards per capita
illData$Cards <- as.numeric(illData$Cards)
illData$cardsPerCapita <- illData$Cards  / illData$Population

# Create population density variable
illData$popDens <- (illData$Population) / st_area(illData$geometry) 

glimpse(illData)
## Rows: 102
## Columns: 13
## $ COUNTY_NAM              <chr> "ADAMS", "ALEXANDER", "BOND", "BOONE", "BROWN"…
## $ CO_FIPS                 <int> 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25,…
## $ Cards                   <dbl> 16841, 1394, 5085, 12121, 1709, 8962, 1920, 45…
## $ TotalHouseholds         <chr> "27,112", "2,154", "6,299", "18,571", "2,055",…
## $ Median.income..dollars. <chr> "52,993", "36,806", "57,289", "69,272", "61,65…
## $ Mean.income..dollars.   <chr> "72,053", "46,981", "72,369", "89,482", "71,08…
## $ PopIncl                 <chr> "64,506", "6,143", "14,824", "53,026", "4,728"…
## $ BelowPovLevel           <chr> "8,031", "1,553", "2,098", "5,341", "549", "4,…
## $ Percent                 <chr> "12.50%", "25.30%", "14.20%", "10.10%", "11.60…
## $ Population              <dbl> 65435, 5761, 16426, 53544, 6578, 32628, 4739, …
## $ geometry                <POLYGON [°]> POLYGON ((-91.50534 40.2002..., POLYGO…
## $ cardsPerCapita          <dbl> 0.2573699, 0.2419719, 0.3095702, 0.2263746, 0.…
## $ popDens                 [1/m^2] 2.903192e-05 [1/m^2], 8.787053e-06 [1/m^2], …

Next, I read in the crime data, which was also in CSV format. The main wrangling needed for this involved obtaining the total reported crimes for only violent and property crimes based on the classifications provided by the Illinois State Police. The types that fall under violent and property crimes are homicide, criminal sexual assault, robbery, aggravated battery/assault, burglary, theft, motor vehicle theft, and arson. So, I summed the total for these crimes while also removing commas from the counts and making the variable numeric. I then created a subset of this dataset by selecting just the total crime and county columns. Similar to earlier, I made the county names uppercase for consistency.

# Read in crime data
crime <- read.csv("2019-18 Index Crime.csv")

# Create variable in data set for total Violent and Property crime for each county
crime$TotalVPCrime <- as.numeric(gsub(",","", crime$CH19)) + as.numeric(gsub(",","", crime$Rape19)) + 
  as.numeric(gsub(",","", crime$Rob19)) + as.numeric(gsub(",","", crime$AggBA19)) + as.numeric(gsub(",","", crime$Burg19)) + 
  as.numeric(gsub(",","", crime$Theft19)) + as.numeric(gsub(",","", crime$MVT19)) + as.numeric(gsub(",","", crime$Arson19))
## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion
# Create new data set by selecting needed variables
vpCrime <- dplyr::select(crime, County, TotalVPCrime)

# Make County uppercase for consistency
vpCrime$County <- toupper(vpCrime$County)

glimpse(vpCrime)
## Rows: 102
## Columns: 2
## $ County       <chr> "ADAMS", "ALEXANDER", "BOND", "BOONE", "BROWN", "BUREAU",…
## $ TotalVPCrime <dbl> 1624, 59, 126, 496, 11, 352, NA, 146, 361, 5470, 262, 132…

After calculating the total crime variable, I merged this dataset with the large overall dataset from earlier, ensuring that all counties are included even if there is an NA value for the total crime. Then, I calculated the crime per capita variable by dividing the total crime variable by the population, and I visualized this on a map.

# Merge crime data with illData data set
illData <- merge(illData, vpCrime, by.x = "COUNTY_NAM", by.y = "County", all.x = TRUE)

# Calculate crime per capita for each county
illData$crimePerCapita <- illData$TotalVPCrime / illData$Population

glimpse(illData)
## Rows: 102
## Columns: 15
## $ COUNTY_NAM              <chr> "ADAMS", "ALEXANDER", "BOND", "BOONE", "BROWN"…
## $ CO_FIPS                 <int> 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25,…
## $ Cards                   <dbl> 16841, 1394, 5085, 12121, 1709, 8962, 1920, 45…
## $ TotalHouseholds         <chr> "27,112", "2,154", "6,299", "18,571", "2,055",…
## $ Median.income..dollars. <chr> "52,993", "36,806", "57,289", "69,272", "61,65…
## $ Mean.income..dollars.   <chr> "72,053", "46,981", "72,369", "89,482", "71,08…
## $ PopIncl                 <chr> "64,506", "6,143", "14,824", "53,026", "4,728"…
## $ BelowPovLevel           <chr> "8,031", "1,553", "2,098", "5,341", "549", "4,…
## $ Percent                 <chr> "12.50%", "25.30%", "14.20%", "10.10%", "11.60…
## $ Population              <dbl> 65435, 5761, 16426, 53544, 6578, 32628, 4739, …
## $ cardsPerCapita          <dbl> 0.2573699, 0.2419719, 0.3095702, 0.2263746, 0.…
## $ popDens                 [1/m^2] 2.903192e-05 [1/m^2], 8.787053e-06 [1/m^2], …
## $ TotalVPCrime            <dbl> 1624, 59, 126, 496, 11, 352, NA, 146, 361, 547…
## $ geometry                <POLYGON [°]> POLYGON ((-91.50534 40.2002..., POLYGO…
## $ crimePerCapita          <dbl> 0.024818522, 0.010241278, 0.007670766, 0.00926…
# Crime per capita map
tm_shape(illData) + tm_fill("crimePerCapita", palette = "YlOrRd", style = "quantile", title = "Crime Per Capita") + tm_borders()

EXPLORATORY METHODS

My method for exploring the correlation between crime and the other factors is simple. I will create a scatter plot for each factor with crime on the y-axis and visualize the correlation. Then, I will also calculate the correlation coefficient to get a more concrete idea of the correlation. If any of these correlation values are high, that could imply that the factor is a cause of crime.

RESULTS

Poverty Rate

First, I evaluated the correlation between crime per capita and the poverty rate. I created a new subset including just the poverty rate and crime per capita for each county, omitting rows with any NA values for either of the two variables. Then, I also removed the percent symbol from the poverty rate variable and made it numeric to plot it. After doing so, there appeared to be a slight positive correlation between the poverty rate and crime per capita, supported by the correlation coefficient of 0.450.

# Create variable with needed attributes, omit NA values
povertyPlot <- na.omit(illData[c("Percent", "crimePerCapita")])

# Remove '%' from percents and make numeric
povertyPlot$Percent <- as.numeric(gsub("%","",povertyPlot$Percent))

# Scatter plot: Crime and Poverty Rate
plot(povertyPlot$Percent, povertyPlot$crimePerCapita, 
     xlab = "Poverty Rate %", ylab = "Crime Per Capita",
     main = "Crime Per Capita vs Poverty Rate in Illinois Counties")

# Correlation coefficient
povertyCor <- cor(povertyPlot$Percent, povertyPlot$crimePerCapita)
povertyCor
## [1] 0.4500325

Median Income

Next was median income. Again, I selected the two needed variables from the overall dataset while omitting any rows with NA values. Afterwards, I removed commas from the median income variable and made it numeric. Then I was able to plot this, where I observed a very slight negative correlation and also calculated the correlation coefficient of -0.234.

# Create variable with needed attributes, omit NA values
incomePlot <- na.omit(illData[c("Median.income..dollars.", "crimePerCapita")])

# Remove commas from number, make numeric
incomePlot$Median.income..dollars. <- as.numeric(gsub(",","",incomePlot$Median.income..dollars.))

# Scatter plot: Crime and Median Income
plot(incomePlot$Median.income..dollars., incomePlot$crimePerCapita, 
     xlab = "Median Income", ylab = "Crime Per Capita",
     main = "Crime Per Capita vs Median Income in Illinois Counties")

# Correlation coefficient
incomeCor <- cor(incomePlot$Median.income..dollars., incomePlot$crimePerCapita)
incomeCor
## [1] -0.2340086

FOID Cards

Then, for FOID cards, I obtained the needed data and omitted rows with NA values again. This plot showed a slight negative correlation, which was supported by the correlation coefficient of -0.399.

# Create variable with needed attributes, omit NA values
foidPlot <- na.omit(illData[c("cardsPerCapita", "crimePerCapita")])

# Scatter plot: Crime and FOID Cards
plot(foidPlot$cardsPerCapita, foidPlot$crimePerCapita, 
     xlab = "Foid Cards Per Capita", ylab = "Crime Per Capita",
     main = "Crime Per Capita vs FOID Cards Per Capita in Illinois Counties")

# Correlation coefficient
foidCor <- cor(foidPlot$cardsPerCapita, foidPlot$crimePerCapita)
foidCor
## [1] -0.3990309

Population Density

Finally, I similarly selected, edited, and plotted the population density with crime per capita. This plot was extremely vertically clustered around the same x value, which made it impossible to gather any idea of correlation visually. However, there was a very slight positive correlation indicated by the correlation coefficient of 0.221.

# Create variable with needed attributes, omit NA values
popDensPlot <- na.omit(illData[c("popDens", "crimePerCapita")])

# Scatter plot: Crime and Population Density
plot(popDensPlot$popDens, popDensPlot$crimePerCapita, 
     xlab = "Population Density", ylab = "Crime Per Capita",
     main = "Crime Per Capita vs Population Density in Illinois Counties")

# Correlation coefficient
popDensCor <- cor(popDensPlot$popDens, popDensPlot$crimePerCapita)
popDensCor
## [1] 0.2206316

DISCUSSION

My goal for this project was to gain some idea of what causes crime, and frankly, I am not completely satisfied with my findings. Firstly, two of the correlation coefficients I calculated were very insignificant, both being around 0.2. These were the coefficients for median income at -0.234 and population density at 0.221. The signs of the correlations were as expected, with median income being positively correlated, implying that higher-income areas have less crime, and population density being negative, implying that higher population density leads to more crime. Still, these values are too small to strongly consider.

The other two correlation coefficients were more significant. Firstly, the coefficient for the poverty rate was 0.450. Unfortunately, this is still not high enough to say that the poverty rate is strictly the cause of crime, but I believe it is high enough to say that it is one significant factor. Next is the correlation coefficient for FOID cards per capita of -0.399. A negative correlation implies that more FOID cards per capita leads to less crime per capita, but again, this coefficient is not high enough to make such a bold claim. Also, this could simply mean that areas with less crime have more FOID cards per capita because of a different reason, like that these areas may have fewer felons and therefore fewer people prevented from obtaining a FOID card. I would like to believe the former and say that more legal gun ownership is just one factor in deterring crime.

In conclusion, based on my findings, the poverty rate is just one significant factor that leads to more crime, but it is not the sole root cause of crime. Legal gun ownership, on the other hand, is one potential crime deterrent, but it is not the solution to stop all crime. It is now my belief that, in order to reduce a significant amount of crime, we should focus on pulling people out of poverty and destigmatizing the idea of legal gun ownership. However, this is not a perfect solution, nor is it guaranteed to be a solution at all, but this is my interpretation. The truth of the matter is that so many factors could lead to or prevent crime, and also, some people are just evil, which is hard to stop. I believe this topic should be researched further, and then maybe, with some proper legislation and education, we can make the world a bit safer.