I’ve moved!

As a project for the new year, I finally set up a website for myself, complete with a WordPress.org blog. There are already some new posts on that new blog, so please come check it out! The new name is Datum By Datum, and you can check out the new blog space here.

Thank you for stopping by!



Infographics and Data Visualization Class: A Few Last-Minute Mock-Ups

The class is over, and unfortunately I fell behind on uploading data visualization drawings. These are the sketches I did for the final two assignments, week 4 and week 6.

In week 4, Alberto Cairo asked us to work with some unemployment data. There isn’t a media outlet that hasn’t covered unemployment with an infographic or data visualization. I decided to create an interactive graphic that answers the question I don’t see answered in any others: what industries were hit hardest in each state?

I usually don’t go for maps, but I feel like this would be a good way for a typical reader to decide if iconic geographical regions were affected in ways we’d expect. Did the steel states lose their automotive jobs? Did the breadbasket of America lose agriculture workers? I tried to represent the state’s primary industry losses with a little icon for a large NAICS or SIC code series. It’s data that would be easy to grab from BLS, but only further investigation would show whether every state would have the same icon on it.
unemployment interactive graphic

This is the mock-up for my final infographic design, based on construction injuries and fatalities. I decided against a map here; just a few simple illustrations of the numbers and trends that I found in the data.
worker injury graphic

A lot of people have asked me whether they should take the class, and what to expect. I think it was great! The fellow students in the forums made a fantastic community for reviewing visualizations and making suggestions. And the lectures and readings are great insights into how newsrooms with great infographics teams approach their projects. And it’s free, so there’s no way to lose!

Explainer: St. Louis and Crime Rankings

By Ray Closson via fotocommunity.com

St. Louis Arch, Courthouse, and Kiener Plaza

Poor St. Louis. It always gets such a bad rap when media groups decide to rank cities by crime rate. Unfortunately, it’s also not a well-earned reputation. And criminologists are quite vocal about that. The FBI even publishes this Caution Against Ranking with the statistics each year. You can sum up their lack of context and applicability with comments on the data it comes from (the FBI’s Uniform Crime Reports), or the fact that the rankings generalize large, variate cities with a blanket of “dangerous” or “not dangerous,” or that numbers alone provide very little context for a single city’s crime issues. But there’s a specific statistical reason why St. Louis always ends up near the top of these rankings.

Historical Context

St. Louis, the city, was a part of St. Louis County until 1876. At that point, the city seceded from the county. The east border of the city is the Mississippi River, the west border extends northeast from the left-hand edge of Forest Park, north to a point ending at Interstate 270, and south to past the intersection of Interstates 44 and 55. The rest of “St. Louis” is part of the county.

The population of St. Louis city is 320,454 people, according to the 2011 UCR. But that is not the number of people who work and do business in the city. Aside from the usual population of tourists found in any large city, St. Louisans often live in the county but work in the numerous public services, hospitals, tourist attractions, and schools of the city. That’s because the metropolitan area of St. Louis continued to expand well after the secession of the city from the county. With space running out in the city, housing and communities were built in the counties, and to this day those are considered the safer, more family-friendly areas to live. Unfortunately, it’s hard to measure the number of people who work in a city–the Bureau of Labor Statistics suggests in non-seasonally adjusted figures for 2011 that 1,295,000 people were employed in “St. Louis, MO-IL.” I’m not sure why Illinois is included in this number, because once you cross the river, you’re in Illinois and East St. Louis, which is a different city entirely (for the record, East St. Louis has a population of 27,087; the city experienced 1,627 reported violent crimes and 2,176 reported property crimes in 2011. For future reference, that’s 140 incidents per 1,000 residents).

Assuming that a simple majority of those 1.295 million workers are employed in the city, it’s easy to see that St. Louis’ population fluctuates broadly every workday. Residents like myself will tell you as much: on weekends and after 5 PM every day, the city is a bit of a ghost town except near the sports centers and tourist areas.

The numbers that are used to rank St. Louis against other cities with populations greater than 100,000–and which showed St. Louis in the number three spot for 2011–use the 320,454 population number, 5,950 reported violent crimes, and 25,669 reported property crimes. That’s 98.7 incidents per 1,000 residents, but there’s no way to separate crimes against non-residents and measure crime per, say, 1,000 individuals present on an average day within the city limits. (The “per 1,000” device is one used often in criminology to quantify crime relative to population.)

It’s worth mentioning that St. Louis isn’t the only city that suffers from this statistical complication, and also that metropolitan area rankings often rank St. Louis far below the top 10 “high-crime” cities. Here are some more details that should make you think twice before writing a ranking.

Index Crimes and Chicago

The FBI’s UCR includes totals for reported crimes under a grouping known as the index crimes. These are four violent crimes and four property crimes used as benchmarks for change over time: murder, aggravated assault, forcible rape, robbery, grand theft auto, arson, burglary, and larceny. Unfortunately, not all police departments report or tally these in uniform ways. As a result, you can see a little footnote at the bottom of all UCR tables indicating that rape statistics for Chicago and the state of Michigan are not recorded the way that the FBI requests.

Reported and Dark Figures

I preface the number of crimes for each category with the word “reported” because journalists need to keep in mind what the UCR actually measures: complaints to the police. This is not an indication of the number of people arrested or convicted of crimes, or the total number of convictions. While that would make you think that these would be overestimates of quantities of crime, also keep in mind the criminological rule of thumb: only half of all crimes are reported, in general. The other half, the unreported crimes, are referred to as the “dark figure” of crime. The most frequently cited campaign to quantify unreported crimes is the National Crime Victimization Survey, a telephone survey of random residents across the United States about their personal experiences with crime. Because this data is not meant for geographical comparisons and trends, it is tallied on the national level, not by state.

Better Stories with UCR Data

Even if you decide that you really don’t want to do a ranking with UCR data, that doesn’t mean you can’t use the numbers at all. There are some great stories lurking in the data for a local crime reporter that your readers can really use.

  • Assuming that your city’s police departments report crimes the same way year after year, this data is very reliable for measuring city-wide trends. Keep in mind that, nationally, crime rates have declined every year for more than two decades–and no one knows why! It’s the great criminological mystery, so be prepared to see your city’s crime rate has been falling as well.
  • The same report featuring the UCR numbers also features numbers for arrests by region and police employees by region.
  • Add to that numbers for hate crimes and police officer injuries and fatalities. Again, it’s more reliable to frame these in comparison to past and future numbers, not in comparison to other regions or cities.

One more thing to note: these numbers are only as accurate as you are cautious. Review methodologies, footnotes, and editor’s notes before deciding how you want to display the data, to be absolutely certain that you are not misrepresenting the numbers.

(I’ll add that, thankfully, the publication I linked to above for ranking St. Louis third in high-crime cities published this article later the same day to add context and caveats to the data, some of which I mentioned above.)

Sexual Harassment, Bullying, and Construction: My Latest for ENR

Illustration by Edel Rodriguez
It took dozens of sources, hundreds of pages of testimony, 70 real-life court cases, and a whole newsroom, but the payoff was worth it. This week, the Engineering News-Record published a special investigation into abusive workplace behaviors and their swiftly changing legal ramifications–with my byline on it. It’s my first cover feature, and I’ll add that it’s also my favorite thing I’ve ever done.

There’s an interesting mix in here of data-driven journalism, investigative reporting, and crowd-sourced information. With the help of a legal researcher, I broke those 70 civil suits into quantifiable data to measure trends and statistics relating to sexual harassment in the construction industry. What we found, in that data and in related EEOC data was that a relatively high percentage of those cases involved men suing men under Title VII of the Civil Rights Act. That eventually led us to bullying, the most recent human resources hot topic in the trades. Title VII does a terrible job of protecting women who have been subjected to abusive work environments, and it does a worse job for men, particularly in the building trades. Judges frequently argue in their decisions that construction is a rough industry known for its uncivil job-site behavior. But in construction, it’s all the more vital for workers to be free of outside pressures and distractions to avoid safety hazards. Are anti-bullying laws the answer?

We found that experts don’t agree on this. Just as many said the laws are necessary as said they are redundant. And since the laws have yet to be enacted in any individual state, we don’t know for sure whether they will bring equal justice or a flood of frivolous lawsuits. The best advice we could come up with: for managers, establish anti-bullying policies to protect yourselves; for workers, don’t be afraid to take up your grievances with management.

Please read my article, as well as the accompanying sidebar and viewpoint, and the article that kicked all of this off in the first place. The paywall closes soon, unfortunately. Let me know what you think in this post’s comments section!

Infographics and Data Visualization class: infographic design part 2

Here’s the second of a series of posts regarding the design of an infographic regarding foreign aid transparency data. This time, I’ve included a panel for a US-centered story, more horizontal graphs, and a unified color scheme.

Infographics and Data Visualization class: infographic design part 1

This week’s assignment for the Knight Center’s MOOC involves designing a visualization around foreign aid donor transparency data. My design is simple, but it involves the inclusion of another set of data to measure the big donors against the little donors (courtesy of AidData 2.0). I’m hoping to make this into a full interactive visualization with Tableau–assuming I get enough complete data from AidData.

Feedback is welcome! First, the complete visualization:

These were the questions I decided that the visualization should answer:

Infographics and Data Visualization class: graphic revision

I’m currently enrolled in the Knight Center massively open online course, taught by Alberto Cairo, a well-known infographics designer and instructor. For this week’s assignment, I decided to revise the infographic we critiqued: the New York Times’ visualization of words used by speakers during the past year’s party conventions.

I found it a bit lacking, since the content required a great deal of scrolling and the interactivity was not fulfilling. I felt that with a little more reporting and design, it could provide more context and be less of a wide wilderness for the reader. Here is my suggestion:

Hopefully you’ll forgive the rough quality of my sketches, but I think they do well to convey my idea (Alberto Cairo has encouraged us to render our revisions any way possible–I took the “pencil and paper” suggestion a bit like a challenge!). I combined another student’s suggestion of adjacent bar graphs with a more content-rich dashboard on the left. Upon first visiting the page, the reader would see a deck describing the methodology and a basic display of the most frequently used words and who used them more. Mousing over the boxes that make up the graph displays a little quote bubbles from a few of the most notable uses of the word (instead of giving the audience every relatively insignificant use of every keyword). The rate I displayed was one quote per use of the word per 200,000 words spoken at the convention. The boxes were inspired by the New York Times’ user-generated visualization of short responses to the death of Osama bin Laden. Except that this one features a twist:

Text is much more valuable when we can indicate tone, and a majority of these speeches are recorded. If the number of highlighted quotes were manageable, appending a sound bite to each of the boxes would allow the user to click on the box to hear the actual quote. This brings a wealth of comparative value to the visualization, as the simple text doesn’t allow me to compare the tone with which a Republican or Democrat is likely to say something like, “The passage of Obamacare will bring enormous changes to our healthcare system.”

Clicking on a box changes the right-side dashboard to also deliver a layer of context. First, the reader can read the entire speech, but most importantly, it allows the reader to see the eventual point being made with the use of the keyword at hand. Second, there’s the opportunity to listen to the quote again. Third, since each individual’s speech is already tagged in the original visualization with the number of times they use each keyword, I think it would be useful to see a graph of the keywords used by the speaker: it give the reader a good general idea of the speech’s theme. Maybe, if I click on a quote that uses the word “better” and find the quote, “Obama/Romney will make things better,” and then see that the speech used the keyword “economy” the most, I can infer that the speaker’s primary concern was of the economic state of America.

The visualization isn’t perfect, of course. I would prefer to display all of the details without relying on two scroll bars, for instance. Having smaller boxes would be preferable, to include as many quotes as possible–but that would mean work for the newsroom. I think there’s probably a reason why the New York Times came up with a relatively light visualization for this news item, so my suggestion is sort of a “what if” projection.

Think it’s better? Worse? Leave a comment!

ENR Internship Clips

I’ve been out of New York for two months now (it feels like forever ago!) and working on a freelance basis with ENR while I get settled in St Louis. A summary of what I did this summer seems a bit late, but it’s better than never. Here are the articles that I worked on while I had the pleasure of working with the Record’s editors and reporters.

  • Study: Global Contract Disputes Worth Less, Last Longer, June 5, web only
    This was my first story in the newsroom, regarding a study released by contract dispute consulting firm EC Harris in London. It was my first international reporting experience!
  • Despite Transparency, Dispute Erupts on California Library Project, June 25, web and print
    This story started as a simple look into a new library project in Palo Alto, but grew in scope as my reporting turned up webcams, progress reports, city council meetings, and even the original construction and design contracts. They painted a picture of a public works project burdened with change orders and finger-pointing.
  • How Radisson Hotel Owners Stiffed the Subs in Wisconsin, July 13, web with visualization
    This story was one of my most popular, staying at the #8 spot for most visited page on ENR in July. Viewers stayed an average of four and a half minutes, in part because the story also included an interactive timeline in the body. Incorporating the timeline in ENR’s CMS deserves its own blog post. Happily everything worked out and the timeline application has accompanied more stories since I left.
  • Dramatic Digs Mark Panama Canal Expansion Progress by Aileen Cho and Luke Abaffy, July 23, panorama viewer
    I can’t claim any responsibility for this great story, but the viewing windows for the two panoramic images are an example of problem-solving under deadline. My editors asked if I knew a way to display the complete photos, rather than chopped up or shrunk down to fit our CMS’s built-in photo formats. Looking for a quick solution, I worked out a way to use custom-sized iframes to give readers a closer look.
  • Can Leo Linbeck’s Super PAC Remake Congress?, July 30, web and print with slideshow
    For one of the magazine’s first election season articles, I got to interview a Texas construction titan with his own super PAC. The web version got around 800 distinct visits in its first week. Web analytics showed that 10% of the article’s viewers also visited the slideshow, which contained graphics illustrating political spending in the super PAC’s successful primary races.
  • Judges Overturn Same-Sex Harass Verdict Against Boh Bros., July 31, web and print
    This is a write-up of an appellate court’s decision to reverse a verdict of same-sex sexual harassment, which was originally decided against Boh Bros. In a week, this short article had more than 1000 unique visits. It even touched off a wider data-driven investigation into same-sex sexual harassment suits in the construction industry (also by me).
  • Univ. of Ill. Voids Design Contract After Ethics Review, July 31, web and print
    This article netted 1700 unique visits in one week, with visitors staying an average of two minutes to read about the University of Illinois in Champaign-Urbana voiding a design contract for fear of the appearance of conflict of interest in the bidding process. The article also got comments, which is exceedingly rare on ENR since new comment regulations were enacted; and on August 2 it appeared in the Architectural Record, ENR’s sister publication.

All but one of the articles are behind paywalls now. If you’re interested in my work, contact me: I can get copies of the stories in PDF form. Thank you for reading!

Data Project: Scraping OSHA Inspections

ScraperWiki's logoOne of the most challenging–yet most rewarding–projects I’ve completed for the Engineering News-Record was the building of a PHP scraper using ScraperWiki. It’s a free service for data liberators to build scrapers in PHP, Ruby, or Python that update automatically and are hosted on ScraperWiki’s servers.

I’ve built a scraper to pull construction industry inspection records from the OSHA database. The resulting dataset records some interesting things: the companies with the most violations, for example.

This was my first time handling PHP. Luckily, there are abundant tutorials and tips available on the ScraperWiki website, and an understanding of another coding language helps.

It stands to be a great resource for ENR and could be valuable for any industry-centered magazine. And I invite anyone to explore the data or use my example to create a scraper for OSHA inspections of a different NAICS code. All you have to do is change the associated NAICS code in the URLs to scrape (23 is construction) and adjust the number of times it scrapes in the loop.

If you have a data set you would like scraped, I can help you build a scraper, or you can send it my way! I’m always looking for new and newsworthy data to mine. Ideal data for an automated scraper is too large to copy and paste and displayed in a uniform format online.

My First Tableau Dashboard

NY bridge GPS graphic

A pretty graphic made with Tableau’s mapping function: every bridge in New York. (Copyright 2012 Erin Richey)

One project on my plate right now involves preparing interactive visualization dashboards for the Engineering News-Record, a publication with which I’ve thoroughly enjoyed interning and continue to work as a freelancer. We’re testing out Tableau as a tool for this project, and I myself have been experimenting with Tableau Public to learn about the platform.

I’ve been attempting to upload my first experiment to WordPress, but it seems the free accounts have no support for iframes or JavaScript embeds. In the meantime, you can see my work-in-progress on my Tumblr, which will probably soon fill up with similar tests and embeds that WordPress doesn’t support.

This is my very first attempt with Tableau Public, and I’m impressed with how easy it is; but I know this is just the tip of the iceberg. If you’re a Tableau expert, please lend me (and my fellow dataviz novices) some advice on Twitter!