Data Project: Scraping OSHA Inspections

ScraperWiki's logoOne of the most challenging–yet most rewarding–projects I’ve completed for the Engineering News-Record was the building of a PHP scraper using ScraperWiki. It’s a free service for data liberators to build scrapers in PHP, Ruby, or Python that update automatically and are hosted on ScraperWiki’s servers.

I’ve built a scraper to pull construction industry inspection records from the OSHA database. The resulting dataset records some interesting things: the companies with the most violations, for example.

This was my first time handling PHP. Luckily, there are abundant tutorials and tips available on the ScraperWiki website, and an understanding of another coding language helps.

It stands to be a great resource for ENR and could be valuable for any industry-centered magazine. And I invite anyone to explore the data or use my example to create a scraper for OSHA inspections of a different NAICS code. All you have to do is change the associated NAICS code in the URLs to scrape (23 is construction) and adjust the number of times it scrapes in the loop.

If you have a data set you would like scraped, I can help you build a scraper, or you can send it my way! I’m always looking for new and newsworthy data to mine. Ideal data for an automated scraper is too large to copy and paste and displayed in a uniform format online.

Advertisements