How do you debug and address model drift?

What is model drift?

When we deploy and monitor machine learning models in operation, we often see the models’ outputs change as new data comes in. This phenomenon is referred to as model drift. It occurs when the distribution of a model’s predictions on the pre-production data differs from the distribution of predictions on production data.

How does model drift relate to our debugging process? Let’s adopt the following four-step debugging cycle:

Evaluate the model’s performance on production data
Narrow the problem scope to a specific form of model drift
Analyze your drift with an appropriate metric
Mitigate drift based on your chosen root cause

Types of model drift

Types of model drift and their underlying causes

To evaluate model drift and narrow down to a root cause, we must understand different types of model drift. Generally, there are two types of model drift to consider:

Model decay: The relationship between the model’s predictions and the labels has changed.
Prediction shift: The model’s predictions have changed.

This brings up an important point: in most cases, model drift is a symptom of other types of drift. There are three potential root causes of model drift that we should be mindful of:

Data/covariate drift occurs when the distribution of a feature in your pre-production data has shifted in your production data
Concept drift happens when the relationship between a feature and your model’s output in your pre-production data has shifted in your production data.

Label drift denotes a change in your production label distribution when compared to pre-production.

Metrics to measure drift

Now that we understand model drift’s potential root causes, we need to choose a metric to analyze and quantify drift. We have written about choosing drift metrics here, but at a high level – you will want to consider the following questions:

Are the features numerical or categorical? Your choice in metric will depend highly on whether your features are numerical/categorical. Additionally, the metric will require you to make assumptions about your data (e.g., type of distribution) and to choose parameters (e.g., bin size).
How interpretable is the metric? Your chosen metric should be interpretable to your stakeholders. Some metrics are in the same units as the feature (e.g., dollars), but other metrics are dimensionless or require domain knowledge. The former types of metrics may be more interpretable to non-technical stakeholders while the latter might tell you more as a data scientist.
Are there rare features/categories in your data? Some metrics are insensitive to small changes in the distribution. If you have a rare feature/category that you are interested in monitoring, then such metrics will fail to capture drift in that feature/category.

To help you use these questions in your selection process, we have compiled a table of drift metrics below.

Metric/Link	Data Type	Interpretation	Useful for Rare Values?
NormDistance	Categorical/Numerical	Distance in units of feature (where distance type is controlled by “p”)	No
Total Variation Distance	Categorical/Numerical	Same as above, where p=1	No
Difference of Means	Numerical	Average difference in units of feature	No
Wasserstein/”Earth Mover” Distance	Numerical	Amount of “work” needed to “move” one distribution into another in units of feature	No
Relative Entropy/KL Divergence	Categorical/Numerical	Amount relative information (in bits/nits)	Yes
Jensen-Shannon Distance	Categorical/Numerical	Weighted sum of KL Divergence for each feature distribution compared to a “midpoint” between distributions (in bits/nits)	Yes
Kolmogorov-Smirnov Test	Numerical	Dimensionless statistic, can be used to perform null hypothesis test	No
Chi-square Test	Numerical	Dimensionless statistic, can be used to perform null hypothesis test	No

How to address model drift?

Now that we know some of the causes of model drift, what can we do to mitigate it? There are a few options we have, including:

Do nothing: While counterintuitive, a good first step is to do nothing to your initial model. This initial model will serve as a baseline as we go through the other potential countermeasures.
Fix quality issues: Before retraining your model, you should make sure to address any data quality issues. This treatment involves spotting inconsistencies in your data pipeline (e.g., categorical data changing to numerical data). To avoid situations like this, build data quality checks into your pipeline!
Retrain (or finetune) your model: This option entails re-training your current model using more (or different) data. You might consider splitting your incoming production data into training/testing data to supplement your pre-production training/testing data, respectively. Then, you can retrain and re-evaluate your model on the test split. You can also consider taking this approach to finetune your trained model from pre-production using new training data from production.
Regularize your model: This solution is effective if you suspect your model drift is due to overfitting. If this is the case, then re-training with a regularized loss function could help you reduce model drift.
Remove a feature from your model: This treatment can be effective in mitigating either data or concept drift. For data drift, you can “freeze” a drifting feature by replacing it with its pre-production mean. “Freezing” the feature’s contribution might improve performance without needing to retrain. For concept drift, you can remove the feature from your pre-production data and retrain your model. This approach may not work if other features are correlated with the removed feature, so be mindful of feature interactions!

Try it yourself!

From selecting your drift metrics to keeping track models, TruEra can help make debugging model drift simple. Get free, instant access to TruEra Diagnostics to get your model drift under control!

Debugging model drift and more with TruEra

Want to read up more on drift in all its forms? Check out these other articles from TruEra:

Certifications

New to TruEra?

How do you debug and address model drift?

What is model drift?

Types of model drift

Metrics to measure drift

How to address model drift?

Try it yourself!

Debugging model drift and more with TruEra

Ready to learn more?

Product

Industries

Learn

Resources

Research

Company

Get in touch with TruEra