Biased Music Reviews Give AI an Edge in Genre Prediction

A robot hand selects genres in the jazz and classical music category
"Streaming platforms will move even further beyond genre. Instead of organizing music by sound, we’ll see it organized by aesthetic or emotion, almost like creating a soundtrack for a lifestyle." - Mauricio Torres

In the age of streaming, using data to quantify our listening habits and personal opinions about music could be viewed as a sterile way of summarizing elusive and complex emotional qualities belonging to the self. In their research, a group of M.S. in Data Science students at the University of Virginia suggest the exact opposite. Data, at its core, examines the most human parts of ourselves without withholding imperfections. 

Sam Kunitz-Levy, Heywood Williams-Tracy, Finn Sjue, Mauricio Torres, and Rameez Ali pulled data from Spotify and the popular album review site Pitchfork, and in doing so created a model that is capable of predicting an album's genre by uncovering trends in biased music reviews. Their work reveals that genre cannot evolve past man-made limitations unless we reckon with the inherent bias present in the music review, scoring, and classification process.

Image
Pitchfork followers by artist.
Pitchfork Followers By Artist

Q: You’ve essentially built a model that can predict the genre of any album. What questions were you looking to answer through the completion of this project and how did utilizing the Pitchfork and Spotify platforms allow you to do so?  

Sam: This is our final project for Professor Prince Afriyie's Predictive Modeling class. He encouraged us to explore both numeric and categorical variables, with a target variable in mind. 

As a group, we are interested in music, so we decided to scrape Pitchfork's album review data. After collecting review score, reviewer, genre, review length, and several other variables from Pitchfork, we turned to Spotify's API to collect additional numeric variables. We thought it would be interesting to combine the two different data sources into a novel dataset. The first target variable that we tested was the album review score. The linear regression model we made does a decent job, but as you'd expect, it isn't perfect. Professor Afriyie teaches us about many types of models in class—when he taught k-nearest neighbors, we thought we could use it to try to predict the album genre. Once we had the dataset, it was fun to come up with other ideas to test. 

Image
Pitchfork album age.
Pitchfork Album Age

Q: In analyzing your data, you noticed a few things: albums scored higher when reviewed further into the future, lesser-known indie artists tend to receive higher ratings than popular artists, and albums experienced a noticeable drop in scoring around 2016 onwards. 

How do you think that human bias has impacted your data? Whether that bias is rooted in nostalgia, gatekeeping genre, or the ability to gain clicks through controversy thus driving monetary gain.  

Heywood: Throughout our analysis, we noticed that there did tend to be a bias towards more indie and up-and-coming artists on Pitchfork. These reviews tended to be more recent since the songs are new. This is interesting as our data from Pitchfork started in 2016, so there is a high likelihood of bias in terms of scores jumping dramatically. With reviews starting in 2016, it is interesting to notice whether reviewers are biased towards older songs, as they perceive music as being better than it is now. 

There are likely many variables that would help to explain that future analysis should touch on. This includes the age of the reviewer and a sentiment analysis of that reviewer’s work. I think you touched on a interesting point with whether songs reviewed today but produced years ago inherently are different, but a lot of further analysis is needed.

Image
Pitchfork score distribution.
Pitchfork Score Distribution of Album Reviews

Q: The success of your model is dependent upon Pitchfork reviewing music that can only be categorized into respected and previously established genres. Do you think that quantifying art down to a numerical value enforces the limitations of genre or provides space for growth? 

Finn: Grading an album’s quality with a single numerical variable limits our interpretation of what listeners consider 'good' music. However, given the time required to analyze the nuances of every review, we must accept this necessary simplification and rely on a single number to capture a reviewer's holistic assessment.

A similar loss of detail occurs in genre classification, as albums often fit into multiple genres or none at all. Our analysis primarily considers which of the nine official tags (Rock, Electronic, Pop/R&B, Folk/Country, Rap, Experimental, Jazz, Metal, and Global) best fits a given album. Yet, we observe that albums' subverting genre expectations—and those by niche artists—tend to receive higher scores than mainstream releases. This indicates that even with these simplified variables, we can still identify an appreciation for artistic complexity. 

Image
Pitchfork review counts.
Pitchfork Review Counts

Q: Spotify has changed the way we listen to music.  

There once was a time when music was consumed in a linear order or on a genre-focused radio station. Now, we have the ability to listen on demand through Spotify features like Daylist or personalized mixes—leading to the creation of hyper-specific genres. How do you see music platforms influenced by algorithmic streaming success continuing to change the listening landscape? 

Mauricio: I’ve noticed that most of the music I discover now comes from Spotify’s algorithm driven recommended songs, and what’s interesting is that they’re usually spot on with my taste. It’s almost like Spotify knows my music taste better than I do, but this also shows how much control these platforms have over what we hear. For example, when I used the free version, the recommendations felt completely off, almost like they were nudging me toward premium by making the shuffle experience worse. That’s a small example of how the algorithm shapes what we listen to and even how we pay for music through these streaming platforms.

With hyper-specific genres, I think we’re seeing a shift away from traditional labels like “rock” or “R&B” toward extremely niche identities like “pink pilates princess.” Part of this is driven by platforms, but also by reviewers and tastemakers. In our Pitchfork project, our model actually found that Pitchfork tends to give higher scores to these smaller, more niche genres. Whether that’s because they want to be early on the next trend or because niche artists draw more attention, it still feeds back into the algorithm. Once a genre starts getting hype on review sites, Spotify picks up on it and pushes it even more.

Streaming platforms will move even further beyond genre. Instead of organizing music by sound, we’ll see it organized by aesthetic or emotion, almost like creating a soundtrack for a lifestyle. Daylist already does this by naming playlists after oddly specific moods and times of day. The more the algorithm learns, the more it can personalize music around how we live, not just what we listen to. Overall, algorithmic success isn’t just changing genres; it’s reshaping how we discover music, how artists get visibility, and even how we define our own taste. 

Image
Three students stand in front of a green binary code background. Two wear sunglasses, one holds a computer and another holds a tiny microphone.
Watch an overview of the predictive modeling project here.

Q: Where can people access your project? 

Rameez: Our full Pitchfork album review analysis, which combines Pitchfork review data and Spotify artist information, will be publicly available once we finish integrating the regression, classification, clustering, and interactive components. You can expect to check back in about two weeks to explore the final product. 

We will be deploying the interactive Shiny app on ShinyApps.io, where users will be able to:

  1. Adjust inputs like log transformed Spotify followers to see how predicted score changes
  2. Use the album year and follower-count filters to explore subsets of the data
  3. Toggle predicted genre probabilities
  4. Explore artist clusters using PCA and k-means
  5. Compare reviewers and visualize top scoring artists and albums

We will also publish the full codebase, write up, as well as our cleaned Pitchfork and Spotify dataset workflow on GitHub at the same time. And of course, a big shout out to Professor Prince Afriyie, Professor Jonathan Kropko, and the UVA School of Data Science for the feedback and support. We are excited to share the final app soon.


Q: What is your most streamed song of 2025 according to Spotify Wrapped?

Heywood: jupiter by almost monday

Finn: Jesus, Ect. by Wilco

Mauricio: Life 2 by Majid Jordan

Author

Marketing and Communications Coordinator