Return

Tracing Historical Political Leaning in Newspapers with GNNs

Fall 2024 • CS 224W (Machine Learning with Graphs) • 10 weeks

#School #Machine Learning #Graphs #Webscraping #PyTorch Geometric


Description

Building on the Newswire dataset from Dell Research @ Harvard, our team of three studied the change of newswire activity across article topics and decades, and how it relates to the polarization of that topic in the United States. We explored a dataset on newspaper articles, including newspapers and their wire sources. We constructed homogeneous and heterogeneous graphs and apply Graph Neural Network (GNN) architectures to generate rich embeddings that can capture newspaper similarity. We then classified the embeddings based on their political leaning and applied clustering methods to examine how newspaper clusters form based on ideology. Lastly, we conducted an ablation analysis of each of our methods, to generate meaningful insights about the effectiveness of different approaches in graphically modeling newspapers.

Responsibilities

▶ Feature construction and dataset cleaning. I also webscraped election results from Dave Leip's Atlas of US Presidential Elections

▶ Trained homogeneous GNNs on this dataset

▶ Evaluated results and examined newspaper node embeddings via dimensionality-reduction techniques (t-SNE, PCA) to better understand model behavior

Writeup on Medium. Code on Github.