The idea for this project came from my experience of reading The New York Times coverage of Africa while living in Ghana in 2009. At the time, I was struck by the difference between the amount and depth of reporting in the Times as compared to coverage from both local news outlets in Ghana and the BBC. My subjective experience was that the Times lacked coverage of important cultural, political, and economic events in Africa. In this project, I explore whether my observation holds any weight empirically by analyzing the frequency and length of news articles published in the "World" section of The New York Times in 2010.
Reading the News Anew
Background
Visualization technique
Each article is displayed as an ellipse, and the area of the ellipse is determined by the number of words in the article. In this way the difference between a 100 word regional briefing and a front-page story is emphasized. The visualization suggests that there are regional tendencies in both frequency and depth of reporting by the Times.
The month view gives the user an idea of the cumulative regional news coverage, as well as the ability to pick out some significant news events (in January the Americas have a visibly dense cluster of articles, most of which pertain to the Haiti earthquake.) As the animation continues, the month view morphs to display only articles in particular topic categories (Political, Economic, Social, War, and Disaster).
Individual articles can be explored by mousing over each article's circle.
Data methodology
The data for this project was culled from The New York Times Article Search API, using the following query terms:
- Query terms: nytd_section_facet:[World]amp&;begin_date=20100101amp&;end_date=20101215
- Returned fields: title, url, geo_facet, nytd_geo_facet, nytd_des_facet, des_facet, classifiers_facet, word_count, byline, date, day_of_week_facet, small_image, page_facet, source_facet
From a raw total of 10,927 articles, the following category of articles were dropped from the sample set:
- Names of the Dead (197)
- Slideshow/Interactive pieces (2,529)
- Duplicates by same title, different url (119)
- Duplicates by same title, different publication date (583)
- Articles with no regional identification, as determined by title and classifiers_facet fields (1,275)
- Regional groupings are those of The New York Times editorial staff, and were determined from parsing the "classifiers_facet" and the title of non-classified articles (e.g. "WORLD BRIEFING | AMERICAS").
Each topic category (Political, Social, Economic, War, Disaster) was determined by grouping over 1,000 unique tags from the "des_facet" field in the Times API. With this method, 81% of articles could be filed in one of the five high-level categories. (1,029 articles were without des_facet.)
The following table specifies the distribution of articles by continent across all topic categories. We perform regression analysis with fixed effects for each unique day of news coverage in order to determine whether each continents' average articles per day and words per article differ from the overall average by continent.
- - - Indicates averages which are statistically significantly below average (at the 5 percent level)
- ++ Indicates averages which are statistically significantly above average (at the 5 percent level)
- * Totals are total number of articles analyzed; some articles may be tagged in more than more region or topic category.
Future work
Though this project began as a way to quantify regional news coverage, it has become an interesting way to read the news. The ability to quickly mouse over many articles and see their titles allows for broader exploration of the news than a traditional page format where space is limited to 20 or so articles. In the month view, you can select from more than 500 articles and quickly understand something about the content of the article, the context of the article (what else was going on internationally at the time), and the significance of the article (is it front page news or a few lines in the "World" section that will be unnoticed by most readers).
The project also raises interesting questions: Does the frequency and depth of reporting reflect an actual bias in coverage, or is it simply an objective reflection of where "news" happens? Future work might include comparing this view of the Times to a similar one from the Guardian, comparing the source of articles across regions (e.g. is the article from the AP or Reuters or the Times), and analyzing historical data.