About Spinroom

Measuring media bias is a complex and elusive challenge. Bias isn’t a simple metric that can be captured with a straightforward algorithm or reduced to black-and-white terms. It often hides in subtle forms: the underlying assumptions woven into a story, the choice of language, the perspective from which an issue is framed, or even what’s covered—or left out—entirely. Complicating matters further, perceptions of bias are inherently subjective. What one person sees as balanced reporting, another might view as skewed. Media outlets naturally must select what to cover and craft narratives that engage their audience, but determining how much weight to give each angle—say, the story of a criminal immigrant versus a vulnerable deportee, or Trump’s provocative rhetoric versus his policy substance—is rarely an objective decision. There’s no universal "right" answer to these questions. Our moral frameworks, shaped by evolution to help us coexist, vary slightly in their priorities, meaning there’s seldom a single "correct" lens on most issues.

At its core, our mission isn’t about stripping media of its stylistic flair or forcing every sentence into a universally agreeable box. Nor do we aim to mandate that every side of every story be covered equally in every article. Instead, we believe that, overall, media should reflect the diversity of perspectives in proportion to their prevalence. The goal is balance across a body of work—not the erasure of opinion, but a fair representation of the spectrum of views.

Traditional approaches to assessing bias, like tallying keyword frequencies, offer a narrow glimpse but often miss the bigger picture. Surveys, while insightful, are costly, inconsistent, and struggle to capture a truly representative sample. This is where large language models (LLMs) come in. Using state-of-the-art models like GPT-4o, Gemini 2.0 Flash, and Claude Sonnet 3.7, we’re developing automated, scalable methods to evaluate media bias holistically. Unlike rigid metrics, LLMs can analyze entire articles, providing repeatable, quantitative results that remain consistent across vast datasets. This allows us to map the broader distribution of narratives within a publication or across outlets.

How We Use LLMs

We’ve designed a series of targeted surveys where models like GPT-4o, Gemini 2.0 Flash, and Claude Sonnet 3.7 evaluate media content across distinct dimensions of bias such as overall bias or favorability towards particular political parties. More specifically, we use four different surveys, the prompts for which are shown below.

Bias Score

The bias score we calculate aims to measure media bias by examining both the average leanings and the consistency of coverage. Rather than simply averaging scores, we remove outliers and model them as a normal distribution and assess the probability mass leaning towards either side of neutrality (zero). This accounts for larger magnitudes of bias in opinion formats. A range of diverse perspectives can indicate balanced coverage, whereas consistently similar scores suggest persistent favoritism. However, interpreting this score as a single number has limitations; it is most insightful when complemented by visualizations and considered within the broader context of the format and coverage choices.

Limitations

No method is flawless, and ours is no exception. LLMs can stumble on adversarial examples, performing unexpectedly poorly on isolated cases. However, such isolated mistakes become less significant at scale. Another challenge is the potential bias within the LLMs themselves—studies suggest some models may favor certain demographics or political leanings, despite efforts by developers to mitigate this through post-training. Additionally, detecting bias often requires contextual knowledge; if an article misrepresents facts or omits critical perspectives, it can be difficult—even for humans—to spot without background. To address this, we’re experimenting with including "ground truth" sources, like a politician’s full speech, as context for evaluating related reporting. Another challenge in evaluating bias is comparing different formats. Longer content will generally have less variance in bias; additionally, what "bias" means in a news format (where it might be more subtle or based on what is covered) versus an opinion format (where it might be more about ensuring the overall body of work is balanced) is not always clear. While LLMs provide a way to put these different formats on a common scale, people might have differing interpretations of what constitutes a fair comparison. Moreover, the categories of "left" and "right" upon which many bias analyses are based represent a simplification of complex political leanings, and these terms have become increasingly ill-defined in contemporary discourse. Finally, while LLMs excel at assessing bias in text holistically, some dimensions—like coverage bias (what stories are chosen or ignored)—remain harder to quantify.

Conclusion

Our goal is to build a robust, LLM-powered framework that brings transparency to media bias. By combining cutting-edge technology with a nuanced understanding of bias’s complexity, we aim to provide tools that help readers, researchers, and journalists better navigate the media landscape—not to dictate a singular truth, but to illuminate the range of perspectives shaping our world.

Contact

If you have any questions or suggestions how we can improve the site, please contact us at contact@spinroom.org.