body.has-navbar-fixed-top { padding-top: 4.5rem; }
Use of Machine learning
Machine learning is central to how we personalize the Home page user experience and connect listeners to the creators that are most relevant to them.
Like many recommendation systems, the Spotify Home page recommendations are powered by two stages:
Content generation by rules, curation, and predictions
Some content is generated via heuristics and rules and some content is manually curated by editors, while other content is generated via predictions using trained models
Three models:
At a high level, an ML workflow can be broken down into three main phases:
Keeping an up to date model
To keep a model up to date (which is more important for some tasks than others; more to come on this), retraining and model versioning are the last steps in our workflow.
Scoring and checking the model
In our Kubeflow pipelines, we have components that check the evaluation score and automatically push the model to production if the score is above our threshold.
Training and serving features
Historically, we have had one set of infrastructure for fetching and transforming features during experimentation (training) and a different set of infrastructure for fetching and transforming features for making predictions (serving).
Detecting drift
The alerting we have added to our data validation pipeline allows us to detect significant differences in our feature sets — it uses the Chebyshev distance metric, which compares the distance between two vectors, and can help alert us to drift in training and serving features.