They may not disrupt your life as much as bedbugs, but nobody wants to live in a place where cockroaches roam free. They can cause the spread of asthma, they can spread dangerous germs and they're just plain ugly.
Each week, New York City restaurant inspectors visit hundreds of restaurants and one of their jobs is to document sightings of live cockroaches. The inspectors add their findings to the publicly available restaurant reports, creating the only official database of roach sightings.
This website shows the zip codes around the city that had the highest number of roach sightings in the past week based on this data. We update the map every Monday, and every month, we send out an email with the ZIP codes and neighborhoods that had the most roach sightings in the most recent four-week period.
We came up with this idea at The Great Urban Hack NYC, a two-day, overnight hackathon that brought together journalists, data scientists and developers on Nov. 6 and 7, 2010.
Read on for the technical details of how this works …
After brainstorming and chucking several ideas, five of us settled on creating a map of roach reports culled from city restaurant health inspection data. Since this site and the map were produced during a hackathon, think of it as a work in progress, one that's free to fork.
We used the NYC Data Mine “restaurant inspection results” raw data set, which describes almost 400,000 results from health inspections performed in restaurants across all of NYC. The most recent report can be downloaded by searching "restaurant inspection results" in the Data Mine raw data catalog.
We then went through Violation.txt file within the data set to find the violation code, 04M, that corresponds to finding “live roaches present in facility's food and/or non-food areas.” This code’s current meaning went into effect July 26, 2010, so we choose to only analyze a 90-day window of data from the WebExtract.txt file.
We wrote a Python script that parses the data set’s WebExtract.txt file line-by-line, counting every single result for each ZIP code in New York in our window and every result specifically related to roaches. This script is called roach_parser.py.
This count data tells us what percentage of inspections resulted in a violation for roaches. For that reason, we chose to analyze the data as draws from a binomial model. To estimate the parameters of this model, we used a hierarchical Bayesian model implemented both as an empirical Bayes approach in NumPy (draw_map.py) and using Gibbs sampling (jags_analysis.R) to correct for the large discrepancies in inspections across ZIP codes. Some ZIP codes only have a few inspections, while other ZIP codes have a very large number of inspections: the Bayesian approach allows us to pool information across all ZIP codes to make up for this discrepancy.
With that information, we plot the data on a map using a NYC shapefile using Matplotlib. This is done in draw_map.py.
Drawing Conclusions is...
The Roach Map is a work in progress. Our files are on Github.
If you're concerned about cockroaches, the city has information about what to do.
The Roach Map was one of a dozen Great Urban Hack NYC projects. See the others on HacksHackers.com.
Ready for more? Go back to the Roach Map.