Back to Timeline

Clustering-Music

Started: 2026-02-23

View on GitHub

Project Progress 100%

About this project

Clustering Music Genres – Machine Learning Project

Author: Leon Motaung

Technologies Used: Python, Pandas, NumPy, Scikit-learn, K-Means Clustering, MinMaxScaler, Plotly

Objective

The objective of this project is to group music genres based on similarities in their audio characteristics using unsupervised machine learning. By analyzing patterns in musical features, this system helps identify clusters of songs that share similar sonic properties.

Problem Overview

Music streaming platforms face challenges in recommending songs due to diverse and subjective user preferences. Traditional demographic data alone is insufficient to understand a listener’s taste.

By analyzing the audio characteristics of songs a user listens to, it becomes possible to identify patterns and recommend similar music. This project applies clustering techniques to group songs with comparable musical features.

Dataset Description

The dataset used in this project consists of popular Spotify songs, including metadata such as artist, genre, and year, along with numerical audio features like tempo, energy, loudness, and danceability.

These audio features provide sufficient information to measure similarity between songs.

Methodology

Loaded the Spotify music dataset using Pandas.
Removed irrelevant columns such as index values.
Explored correlations between audio features.
Selected key numerical audio attributes for clustering.
Scaled features to a uniform range using Min-Max scaling.
Applied K-Means clustering to group songs into segments.
Mapped cluster labels to readable segment names.

Machine Learning Approach

This project uses K-Means clustering, an unsupervised learning algorithm that groups data points based on distance to cluster centroids. Songs with similar audio characteristics are assigned to the same cluster.

The clustering process was performed using selected features such as:

Beats Per Minute (BPM)
Loudness
Liveness
Valence
Acousticness
Speechiness

Model Training and Clustering

The K-Means algorithm was configured to create ten distinct clusters. Each cluster represents a group of songs with similar musical traits.

Number of Clusters: 10
Clustering Type: Unsupervised Learning
Algorithm: K-Means

Results

After clustering, a new column named Music Segments was added to the dataset. Each song was assigned to a cluster representing its musical similarity group.

Songs from different genres often appeared in the same cluster, indicating that audio characteristics can transcend traditional genre labels.

Visualization

A 3D interactive visualization was created using Plotly to explore clusters based on BPM, Energy, and Danceability. This visualization helps in understanding how songs are grouped in feature space.

Key Learnings

Unsupervised learning can reveal hidden patterns in music data.
Audio features are powerful indicators of musical similarity.
K-Means clustering is effective for segmenting large music datasets.
Data scaling is essential for distance-based algorithms.

Insights

Genres Are Not Absolute

Songs from different genres can share similar audio characteristics, making clustering a valuable alternative to traditional genre classification.

Feature Selection Matters

Choosing the right audio features significantly impacts the quality and interpretability of the clusters.

Clustering Supports Recommendation Systems

Grouping music by similarity enables more personalized and accurate music recommendations for streaming platforms.

Project Structure

data/ – Spotify music dataset
analysis/ – Data exploration and clustering scripts
visualization/ – Interactive cluster plots
README.md – Project documentation

This project demonstrates how clustering techniques can be applied to music data to uncover meaningful patterns and support recommendation systems using machine learning.