Multi-label song classifier predicting playlist assignments across 11 genre and mood categories. Built on Sentence-BERT embeddings and a multi-output MLP — and preceded by a full OCR-based data recovery pipeline that reconstructed an 18,000-song library corrupted during platform migration.
| Component | Details |
|---|---|
| Embedding | all-MiniLM-L6-v2 — 384-dimensional Sentence-BERT vectors |
| Classifier | MLPClassifier(hidden_layer_sizes=(256, 128), max_iter=300) |
| Wrapper | MultiOutputClassifier — independent binary classifier per label |
| Class Balancing | Upsample with replacement to max class size before training |
| Runtime | ~10 min on Colab T4 GPU for 241K songs |