Nashville Traffic Stop Analysis

The most common interaction citizens have with police are through traffic stops. 50,000 drivers are pulled over per day in the United States (Pierson et. al). Police departments across the country offer bountiful, yet disorganized data sets. Thankfully, the Stanford Open Policing Project, has standardized police traffic stop data. Spearheaded by the Stanford Computational Journalism Lab and Stanford Computational Policy lab, the project created a repository, storing datasets of police traffic stops at a state, county, and state level. Every dataset follows a template, which makes it convenient to compare multiple cities, counties, and states.

Last Spring semester, I ran a logistic regression on multiple data sets in the Data Science Research Circle class. The model would give the likelihood of an individual being searched given, their race, age, time of day, and location. Having learned more techniques from Computational Statistics, I decided to implement what I have learned this semester in this research project. This paper, will examine one dataset from Nashville Tennessee, but take a closer look at classifying the outcome of a driver based on multiple variables. Specifically, I applied a classification tree and support vector machine to analyze police behavior and racial disparities in Nashville. We will analyze Nashville because the dataset offered the most numeric and categorical variables, with 41 variables in total. Moreover, the datasets holds over 3 million observations, across nine years from 2010 to 2019. With so many observations and limited computer memory, it seemed imperative to reduce seemingly unnecessary variables and observations. As a result, I decided to only look at observations in 2017, giving us well over 245000 data points. A quick reduction I performed was removing variables that had “raw” in their name as those are raw variables that are redundant to the standardize variables.

Exploratory Data Analysis

In exploratory data analysis I looked that the relationship among age, race, location, time, and police action such as conducting a search or an arrest. I will first examine the geographical location of police stops by race. The package, Leaflet, offered the ability to plot coordinates over a map. Figure 1 plots each observation’s latitude and longitude over a map of Nashville. It seemed illogical location would look separable because traffic stops happen in moving Vehicles, and they would not represent the demographics in relation to the location. However, the plot reveals some noteworthy features. One being that a lot of black drivers clustered closer in the North West, while white drivers are more evenly distributed. Moreover, the racial diversity becomes more mixed in the South East portion of Nashville, showing more Hispanic drivers. The plot suggests that driver racial groups tend to cluster. However, more analysis is warranted given, that latitude and longitude only make up two of the 45 variables.

Map of coordinates colored by the driver’s race

Next, let’s examine a more empirical view of the racial makeup as a function of different variables. Numerically, Black and Hispanic drivers face more searches by police than do White drivers. However, this does not account for the population size for each race. As a result, I decided to use the Outcome Test, developed by Gary Becker. The method analyzes the hit rate of a police officer finding contraband on a searched driver. A hit rate is defined as true if a police searches a driver and finds contraband, and false if a police searches a driver but does not find anything. Figure 2 shows the hit rates for Black, Hispanic, and White drivers.

Plot shows number hit rates per race

The plot reveals that white drivers get pulled over less frequently, but exhibit high hit rates. In fact, the hit rates for white drivers is higher than hispanics. This suggests that the police are putting a double standards on driver’s of color. Since, the outcome test is signaling differences in each race, the variables search_conducted and contraband_found should be taken into consideration for the learner. Figure 3 looks at the number of stop by age, colored by race. Among all the racial groups, the number of stops seem to differ by the same amount at every age up to 75. Furthermore, young drivers seem to get stopped by the police, but the number goes down as age increases. The plot hints at the idea that age and race influence each other in traffic stop outcomes.

Plot shows number of drivers stopped at each age, colored by race

The last plot in EDA shows the count of drivers plotted against the time of day. As expected, most traffic stops occur in the day (probably when people leave for work) and decrease during working hours. Plotting the hours per day and traffic stop counts was inspired in part by the Veil of Darkness test(Pierson et. al). The Veil of Darkness test suggests that in the dark, police officers are less bias towards racial groups because they cannot see skin color, which is juxtaposed to day light stop rates. A noteworthy observation is white drivers tend to show the most variance among the racial groups, with black drivers following behind.

Smoothed line plot of the number of traffic stops at each hour

These findings suggest traffic stops by racial groups tend to group together in Nashville. This surprised me because these are vehicular traffic stops. The fact that traffic stop’s coordinates can give insight on classifying a driver, indicates a larger systematic role such as demographics or socioeconomic status. When a police officer searches a driver, there seems to be racial bias against black and hispanic drivers. This brings up the questions on how to organize the rich dataset that will be further discussed. Age plays a factor when drivers are young but the racial groups tend to the same count after 75 years old. However, the wide differences in frequencies by racial groups, couple with the fact that young drivers get pulled over more, make age a interesting variable. Location, age, searches, and contraband will serve as the primary variables in our model to classify a driver based on race and a search being conducted.

Classification Tree

The first learner I implemented was a classification tree. While the exploratory data analysis showed non-linear data points, a classification tree is useful in determining which variables are significant by observing the split points. I ran this classification tree with the variables: lat, lng, subject_race, hour, search_conducted, contraband found. Figure 5 shows the results from the classification tree. The tree uses latitude and longitude as its first spit points. Next, it uses subject age. It is worth noting that the tree did not utilize search conducted or contraband found in it’s splits. In fact, looking more closely at the datasets reveals that only 4% of all traffic stops did not have a search conducted. Going forward, this suggests a trade off in choosing our data to learn: we can either look at more data points while increasing the ratio of total traffic stops to searched stops or we filter the data so that we only include stops where a search was conducted. The latter will gives us much fewer data points.

Classification Tree on Racial Groups

Figure 6 is a map showing the decision boundaries for latitude and longitude only. At a glance, rpart seems to do a satisfactory job. For the most part, it captures the general location of black drivers in North West Nashville and hispanic drivers in the South East. However, the numerous variables and data types made it imperative to try a support vector machine and random forest for flexibility.

Classification Map

SVM

The first Support Vector Machine I ran classified racial groups based on latitude, longitude, hour of day, search conducted, and contraband found. The data frame did contained missing values. In fact, there about 132 missing age values, 191 subject_race, and 5 missing search conducted. Given the large data set contained 23877 observations and I would be narrowing the dataset to the we only examine stops where a search was conducted, I decided to not impute any variables.

The SVM gave mediocre results. The test accuracy is 65%. The SVM was trained with 8545 training points which is a rather small training size. Black driver showed the high true-positive rate at .87, whereas asian drivers had the highest true negative rate at 1.0. A thing that struck me as significant is that black and white drivers had almost opposite sensitivity and specificity. This suggests that the model is good as distinguishing asians, hispanics, and other racial groups apart from black and white drivers. It is also worth mentioning that the variables included search conducted and contraband found. Based on the exploratory data analysis, there might be better signal combining search conducted and contraband found into one variable called misses, that selects stops where a search was conducted but contraband not found. Another run is warranted.

Random Forest

Random forest scored a 62% accuracy. The data was the same used for the first SVM. This rather low score, prompted me to rearrange the data that could emulate the Outcome test.

SVM With Misses

On the next SVM run, I replaced search conducted and contraband found with a new variable called miss. Miss is true if a search was conducted but no contraband was found and false contraband was indeed found. The results for this test was slightly higher at 68% accurate, but nothing significantly different. The test accuracy supports the idea that hit rates show more signal than searches and contraband findings alone. Furthermore, it suggests that the police are applying a double standard towards racial groups and inducing racial discrimination. Some more noteworthy results were that Sensitivity increased for white drivers, but decrease for black. However, similar to the previous learner, had high sensitivity and specificity indicating that it was challenging for the learner to classify them cleanly. Consequently, the high specificity among the asian, hispanic, and other drivers show that the learner to distinguish them better than white and black drivers.

Conclusion

Overall, the learners did not give the strongest results. Support Vector Machines with the created variable, miss, gave the best predictions with 68% accuracy. While the improvement is small, it supports the Outcome test in that police tend to show racial discrimination towards black and hispanic drivers. In fact, SVM could distinguish non-white and non-black drivers pretty well as asian, hispanic, and other ethnic groups boasted high specificity scores. Some goals for a better classifier would be to examine discrimination across multiple cities, use the veil of darkness test, and try manipulating more variables. The biggest take away from running these experiments were that a larger sample size could help. Unfortunately, filtering out traffic stops that had a search conducted meant, dropping the number of observations from 23877 to about 9000. That’s a pretty drastic decrease, but it could be remedied by joining datasets from other cities, and perhaps, looking at another measure to determine racial discrimination. Further research is warranted.

References

Pierson, E., Simoiu, C., Overgoor, J. et al. A large-scale analysis of racial disparities in police stops across the United States. Nat Hum Behav 4, 736–745 (2020). https://doi.org/10.1038/s41562-020-0858-1