An Awesome Spotify Playlist Analysis
Together with a couple of friends, we’ve created our own personal Awesome Mix Vol.1 Instead of being a tape with 13 songs however, we’ve added roughly 1.500 songs. Now I’m curious as to how our musical taste differs from one another, but also what kind of musical clusters we have created in our playlist.
Let’s get started.
## Rspotify spotifyr tidyverse knitr kableExtra ggthemes
## TRUE TRUE TRUE TRUE TRUE TRUE
## highcharter htmltools widgetframe cluster factoextra here
## TRUE TRUE TRUE TRUE TRUE TRUE
First, I’ll have to extract the audio features of each song in the playlist. This is where the spotifyr
package helps me out. I have removed user names or ID’s for privacy reasons.
#tracks <- get_playlist_audio_features("xxxxxxxxxx", playlist_uris = "xxxxxxxxxxxxxxx")
# Keep relevant information
tracks <- tracks %>%
select(artist_name, track_name, album_name, album_img,
track_popularity, danceability, energy, loudness,
speechiness, acousticness, instrumentalness, liveness,
valence, tempo, key, key_mode, duration_ms,
time_signature, track_preview_url, track_open_spotify_url)
head(tracks, n = 5) %>%
kable(format = "html") %>%
kable_styling(bootstrap_options = c("hover", "striped", "responsive", "condensed"),
full_width = T,
position = "left") %>%
scroll_box(width = "100%")
artist_name | track_name | album_name | album_img | track_popularity | danceability | energy | loudness | speechiness | acousticness | instrumentalness | liveness | valence | tempo | key | key_mode | duration_ms | time_signature | track_preview_url | track_open_spotify_url |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Editors | All the Kings | IN DREAM | https://i.scdn.co/image/b45a68abbf289097f42b224e10ae834f2547f594 | 43 | 0.440 | 0.539 | -9.039 | 0.0391 | 3.84e-01 | 0.00e+00 | 0.196 | 0.108 | 115.019 | D | D minor | 293562 | 4 | https://p.scdn.co/mp3-preview/bcbe1c796000b1e94a1a67a68ab0caa3b4bdb395?cid=209f3c299b644b06acd255c0166fe5bb | https://open.spotify.com/track/7vsqpQcPaBWzAFvoopHrCd |
alt-J | Tessellate | An Awesome Wave (Deluxe Version) | https://i.scdn.co/image/70b570b709ac08c4d700386ff15030ae88a18678 | 48 | 0.681 | 0.608 | -6.471 | 0.0449 | 3.64e-01 | 4.93e-02 | 0.119 | 0.418 | 116.878 | D | D major | 182667 | 4 | https://p.scdn.co/mp3-preview/106ca0041294360730fcd351c438b35bafdc3196?cid=209f3c299b644b06acd255c0166fe5bb | https://open.spotify.com/track/1QXzQKmQiDOzGHwSXVdHTp |
Weezer | Back To The Shack | Back To The Shack | https://i.scdn.co/image/a44dcda7b7b87761c2b42a3d7eb9a457429a9906 | 9 | 0.435 | 0.706 | -5.310 | 0.0428 | 6.05e-03 | 7.25e-05 | 0.119 | 0.658 | 171.913 | C# | C# major | 186613 | 4 | NA | https://open.spotify.com/track/4pHQSaOkLN3BvHPRjVm8ws |
The Offspring | Want You Bad | Conspiracy Of One | https://i.scdn.co/image/b82ca2c8074ac5dbb560561b9a14578b4087375f | 4 | 0.487 | 0.969 | -4.293 | 0.0505 | 6.59e-05 | 1.20e-06 | 0.278 | 0.626 | 105.539 | E | E major | 202600 | 4 | NA | https://open.spotify.com/track/09ZEB3X2oswrIBBuzuVLEt |
Imagine Dragons | I Bet My Life | I Bet My Life | https://i.scdn.co/image/3db65a1df5dacd133d229141e3527fdf3481c132 | 29 | 0.558 | 0.649 | -8.033 | 0.0389 | 2.29e-01 | 5.23e-04 | 0.312 | 0.570 | 107.894 | C# | C# major | 192893 | 4 | NA | https://open.spotify.com/track/7q2f7lhHTv7j7EFG0vplwA |
Perfect! Almost. I’m missing the column Added By
, which does show in our spotify playlist.
Unfortunately, when I simply tried to copy and paste our complete track overview of the playlist, each record would give me the spotify link to the song (e.g. https://open.spotify.com/track/17g3YBfU8QfYtkgZGI8tTT
) rather than the actual data. A quick google search for “Export Spotify Playlists” got me to a JavaScript app called Exportify
. This worked like a charm, and provided me with a downloadable .csv
file.
#Import Exportify raw .csv data
mixtape_raw <- read_csv(here("static", "data", "Spotify/awesome_mixtape_1.csv"))
This .csv file did include the Added By
column, which we can add to the tracks dataset after some data transformations.
track_name | artist_name | added_by |
---|---|---|
All the Kings | Editors | G |
Tessellate | alt-J | M |
Back To The Shack | Weezer | M |
Want You Bad | The Offspring | V |
I Bet My Life | Imagine Dragons | M |
Cool. Time to join the two dataframes together and start with the analysis!
mixtape <- tracks %>%
inner_join(mixtape_less_raw, by = c("track_name", "artist_name")) %>%
filter(valence > 0)
Awesome Music Analysis
Spotify adds a bunch of music statistics to each song. I’ll be using these statistics to find out how different our music tastes are, and where they are different (if at all…, we do share a playlist after all). I’ll be looking mainly at the following features:
Feature | Description | Values |
---|---|---|
Danceability | Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. | 0 to 1 |
Energy | Energy represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. | 0 to 1 |
Valence | A measure describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). | 0 to 1 |
Loudness | The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). | -60dB to 0dB |
Tempo | The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. | 0 bpm to 250bpm |
Popularity | Although not a musical feature, the popularity index provides a way of determining how popular a song is. The exact description is not provided, but I’m sure it has a well thought-out algorithm underneath. | 0 to 100 |
Cue the violins!
Normally I’d use boxplots to see if the densities in our musical taste is different from eachother.. Except we are dealing with music here, which just screams the use of violin plots
.
We mostly see that G
has a thicker upside on Valence - he tends to like happier songs than the others. Specifically, V
tends to add more songs low on the Valence scale. M
and S
are somewhere in between. On Danceability
and Energy
there is not much to say about the differences - we seem fairly identical in those regards.
Well I guess that is clear. We have extremely similar taste when it comes to individual attributes. Popularity, Tempo and Loudness don’t really seem to have any clear distinction among us.
One thing that is missing however, is the combination of features! The combination of both Energy
and Valence
can give some very diverse results. In the interactive plot below all of the songs are added with their attached value on these two attributes. This plot is heavily inspired by the Sentify app created by RCharlie
. He has attached meaning to the value combinations in an arbitrary way.
A track with high energy
and high valence
will be an active happy song, while a low energy and low valence song will be a more sad and depressing song. Anyways, enjoy lookin at the songs and our individual tastes, and what songs classify as happy or sad!
Most songs seems to be in the upper half of the energy atmosphere. Which seems reasonable considering our preference for Rock and Punk-Rock styles. However, we do see a slight difference in the energy / valence combination of songs for V
. He tends to like more negatively loaded energetic songs from artists like Sum 41
, Muse
or Shinedown
. Especially G
seems to enjoy the happier highly energetic songs from artists such as Smash Mouth
, The Strokes
, The Kinks
and The Bloodhound Gang
. M
and S
are on the more neutal spectrum with regards to valence. They prefer songs that are not overhally happy or sad.
Finding Awesome Clusters
So in which musical clusters can we divide our Awesome Mixtape #1
playlist? To answer this question I will be using the K-Means
clustering method. The basic idea behind k-means clustering consists of defining clusters so that the total within-cluster variation is minimized.
The drawback is that we have to specify the number of clusters we want the data to be devided into. In order to determine the optimal number of k
I emply both the elbow method and silhouette method as seen below. Although the silhouette method suggests k = 2
as the optimal number, this will not provide me with much more detail. Therefor I’ll guide my choice based on the elbow method, where at k = 5
the line starts bending somewhat.
# Centering and Scaling is necessary for the k-means to work properly
cluster_df <- mixtape %>%
select_if(.predicate = is.numeric) %>%
map_df(scale)
set.seed(123)
gridExtra::grid.arrange(
fviz_nbclust(cluster_df, kmeans, method = "wss") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust =0.5)) +
labs(subtitle = "Elbow Method"),
fviz_nbclust(cluster_df, kmeans, method = "silhouette") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust =0.5))+
labs(subtitle = "Silhouette Method"),
ncol = 2)
Creating final clusters with k = 5
.
final_clusters <- kmeans(cluster_df, 5, nstart = 25)
mixtape_clusters <- mixtape %>%
mutate(cluster = final_clusters$cluster) %>%
select(cluster, artist_name, track_name, danceability,
energy, valence, loudness, speechiness,
acousticness, instrumentalness,
liveness, tempo, duration_ms)
Now that everything is done, we can start looking at the clusters, and see what kind of distinction the algorithm made.
mixtape_clusters %>%
group_by(cluster) %>%
select(-artist_name, -track_name) %>%
summarize_all(mean) %>%
mutate_all(round, 3) %>%
kable(format = "html") %>%
kable_styling(bootstrap_options = c("hover", "striped",
"responsive", "condensed"),
full_width = T,
position = "left") %>%
scroll_box(width = "100%")
cluster | danceability | energy | valence | loudness | speechiness | acousticness | instrumentalness | liveness | tempo | duration_ms |
---|---|---|---|---|---|---|---|---|---|---|
1 | 0.479 | 0.672 | 0.335 | -7.996 | 0.050 | 0.162 | 0.679 | 0.181 | 126.661 | 317651.7 |
2 | 0.527 | 0.466 | 0.325 | -9.669 | 0.044 | 0.422 | 0.033 | 0.149 | 112.241 | 268738.4 |
3 | 0.395 | 0.836 | 0.394 | -5.084 | 0.071 | 0.030 | 0.022 | 0.242 | 146.018 | 245262.5 |
4 | 0.397 | 0.619 | 0.340 | -7.430 | 0.049 | 0.223 | 0.053 | 0.196 | 132.234 | 256704.8 |
5 | 0.578 | 0.797 | 0.617 | -5.483 | 0.059 | 0.082 | 0.020 | 0.174 | 118.088 | 224196.1 |
Although the distinction is hard to tell in this way, I see the following patterns:
Cluster 1
- Fun, happy, dancable and energetic songs.Cluster 2
- Angry, up-tempo songsCluster 3
- Instrumental, acoustic songsCluster 4
- High tempo and energetic instrumental songsCluster 5
- Far more likely to be live performance songs
Let’s try to see if this fits with some songs for each cluster!
Cluster 1 | Cluster 2 |
---|---|
The National - Fake Empire | Bear’s Den - New Jerusalem |
The Notwist - Consequence | Ed Sheeran - Little Lady - Mikill Pane |
Paul Kalkbrenner - Sky and Sand | Editors - Ocean of Night |
Porcupine Tree - Lazarus | Causes - Teach Me How To Dance With You |
Dropkick Murphys - 4-15-13 | The Whitest Boy Alive - High On The Heels |
Mando Diao - Black Saturday | The Head and the Heart - Lost In My Mind |
Sum 41 - Exit Song | Lou Reed - Walk on the Wild Side |
Explosions In The Sky - Your Hand In Mine | Pearl Jam - Black |
Muse - Resistance | Bear’s Den - Sophie |
Klangkarussell - Sonnentanz | Kaleo - Save Yourself |
Cluster 3 | Cluster 4 |
---|---|
Bon Jovi - These Days | Lord Huron - Meet Me in the Woods |
Mansun - Wide Open Space | Lord Huron - The Night We Met |
Editors - Spiders | Seafret - Be There |
The Bohicas - Where You At | Tom Grennan - Giving It All |
Foals - What Went Down | Biffy Clyro - The Captain |
Thirteen Senses - Thru The Glass | System Of A Down - Lonely Day |
Wolfmother - Victorious | Donovan - Catch the Wind |
Thirty Seconds To Mars - Vox Populi | The National - Heavenfaced |
U2 - City Of Blinding Lights | Damien Rice - Amie |
Rival Sons - Keep On Swinging | Muse - Dig Down |
Cluster 5 |
---|
The Raconteurs - Steady, As She Goes |
The Proclaimers - I’m Gonna Be (500 Miles) |
Admiral Freebee - Einstein Brain |
Typhoon & New Cool Collective - Bumaye |
The White Stripes - You’re Pretty Good Looking |
Rage Against The Machine - Killing In the Name |
Bob Dylan - Hurricane |
Mumford & Sons - Wilder Mind |
Genesis - Jesus He Knows Me - 2007 Digital Remaster |
The Killers - The Man |
Awesome Conclusion
As a first impression, the songs fit the descriptions I made quite well. I do think that I put too much emphasis on energy
. For example cluster 2
does have quite negatively loaded songs, but aren’t necessarily energetic songs. Cluster 3
seems to fit the acousticness
value quite well. The addition of Xavier Rudd, Dermot Kennedy and Luke Sital-Singh confirm the slow acoustic nature of this cluster.
In Cluster 5
we see only one live performed song, yet still one more than in the other clusters. So perhaps if we had a bigger sample of songs we’d see more added live songs in this cluster.
In the end, I feel like the clusters made some nice distinctions. Perhaps in the future I could add more metadata or even sentiment of the songs by analyzing the lyrics of a song with the Genius API.