Information-Theoretic Measures of Music Listening

October 27, 2014

Having scraped a lot of playlists and listening histories from last.fm, I was keen to see which music features were associated with the way users made playlists, or how their listening changed over time. This user-centred approach to feature selection and system evaluation was published at ISMIR 2014. Some example code and interactive demos follow. Read the paper.


Analysing User Listening History

This example shows a user's listening behaviour over several years. The entropy over features is computed for a moving window of track plays, indicating how diverse the listening was over each feature. Taking artist IDs as example, throughout 2007-2010, the entropy is often low, with some sessions of just one artist. In February 2010, the user began using online radio, listening to tracks irrespective of artist, causing a distinct change in entropy. Note that they still sometimes continued their old style of music retrieval as well.

Example code

library(RSQLite)
library(infotheo)
library(reshape)
library(ggplot2)
library(changepoint)
library(zoo)
library(RColorBrewer)

# Query the dataset
con <- dbConnect(SQLite(), "spud.sqlite")
history <- dbGetQuery(con, "SELECT tl.date, t.* FROM lastfmtracklistens as tl JOIN tracks as t ON tl.track=t.trackid WHERE tl.user=2275;")

# Sort listens by date 
history <- history[with(history, order(date)),]

# We need discrete features for entropy functions
history[,'popularity'] <- discretize(history[,'popularity'],nbins=25)
history[,'duration'] <- discretize(history[,'duration'],nbins=25)

# Select a feature
f <- zoo(history$artist)
time(f) <- as.POSIXlt(history$date)

# Calculate the windowed entropy
H <- rollapply(f, 20, entropy, by=20, partial=TRUE)
Hdf <- fortify(H)

# Find a changepoint
cp <- Hdf[attributes(cpt.meanvar(Hdf$H))$cpts[1],]$Index
Hdf$aftercp <- Hdf$Index > cp
historyPlot <- ggplot(data=Hdf, aes(x=Index, y=H)) + geom_point(alpha=0.7, aes(colour=aftercp)) + ...

Playlist-based Feature Selection

This example shows how to identify music features relevant to how users made the 10,000 playlists in the dataset. We retrieved music features for each song and calculate the Mutual-Information of music features with playlist membership.


Example code

library(infotheo)
library(RSQLite)
library(ggplot2)

# Calculate Adapted Mutual Information
# We normalise against feature entropy only, and not class (playlist)
# Refer to the paper for more details
getAMI <- function(feature, class) {
  I <- mutinformation(class, feature)
  n0 <- mutinformation(class[sample.int(length(class))], feature)
  #return((I - n0) / (max(entropy(class), entropy(feature)) - n0))
  return((I - n0) / (entropy(feature) - n0))
}

# Query the dataset
con <- dbConnect(SQLite(), "spud.sqlite")
# Our entropy view joins playlists with music features
# These are propriertary and not currently included in the SPUD dataset
data <- dbGetQuery(con, "SELECT * FROM [Entropy];")

# Select the features we're interested in
features <- data[,6:(ncol(data)-4)]
class <- data$playlist

# Calculate AMI between features and class
fami <- apply(features,2,getAMI,class)

Dataset

The SPUD dataset contains 19,225 playlists created by Last.fm users. Spotify URI and song metadata are included for every song. Full listening history is also included for a number of users. The dataset is in SQLite database format and is available under the MIT open source license:

Download dataset

Written on October 27, 2014