Achieving breadth, depth, and speed to analyze and manage models at scale
- To build a powerful AI Quality platform that meets customer needs, TruEra’s solution needed to be: universal in approach, so that it can analyze the broad diversity of models that customers need,
- Support deeper model explanations and analytics, and,
- Achieve model analytics performance equivalent to native integration.
Many enterprises train, deploy and maintain lots of models. These models often cater to diverse use cases, such as fraud detection, credit risk modeling, marketing, demand forecasting, product recommendations, etc. They are built using different ML frameworks, often across various teams in the organization. Examples include scikit-learn, XGBoost, Spark MLLib, catboost, lightgbm, PMML, DataRobot, MLeap, SAS, R, and external model APIs. It is important for a powerful AI Quality Platform to support these different ML frameworks and provide efficient, consistent, and comparable analytics.
Today, the TruEra AI Quality system provides best-in-class ML explainability and AI Quality analytics. When we started building a model execution platform to power our software service two years ago, the first question we had to tackle was:
How do we generically represent an ML model and build an AI Quality analytics engine around it?
We first looked at popular approaches and frameworks already used in the industry for model execution. We found that most model execution platforms broadly fall in the following categories. Here is a summary of those platforms, strengths, and drawbacks:
- Single Framework Platforms supporting a single model framework or a small set of similar frameworks, for example, TFX. These frameworks tend to be focused on optimizing the performance of predictions. However, using these one-off execution frameworks would make it very hard to build a generic AI Quality platform.
Figure 1 – TensorFlow model
An example of a TensorFlow saved model, which can be served via TFX. While being language-independent, it supports only TensorFlow models.
- Model Execution Service with dedicated model execution services which execute models as a microservice, often in containers, for example, MLFlow, BentoML. The model is packaged along with all of its dependencies and possibly custom launch logic into some sort of an image. The execution service then launches this image as a container which usually exposes an API endpoint and is scored remotely. While these services support a large set of model frameworks and enable reproducible model execution, the model API often consumes a blob and produces a blob, which might not have a strict schema. While this simplifies the infrastructure by putting the onus of defining the input and output of the model on the model author, it makes it hard to reason about the individual features and their influence on the outcomes. Furthermore, the model internals ishidden in this representation, which precludes a white-box analysis of the model, which is often computationally more efficient and for certain models provide deeper model analytics. Hence this approach does not allow generic deeper model analytics.
Figure 2 – MLFlow model format
The MLFlow model format allows representing any model as a flavor. Specifically for Python, any object exposing the `predict` API as defined above can be served as a model.
- Common Intermediate Format involves converting trained models to one intermediate format like PMML or ONNX and using dedicated scoring servers for these formats. A lot of model frameworks allow exporting trained models into a generic intermediate representation along with the transformation pipeline. This could allow an AI Quality system to be built against these generic representations that could then work with a diverse set of model frameworks and support deeper model analytics. We found that, over time, most mature organizations come up with their own custom layers and wrappers around existing frameworks and tools. Users could also combine multiple models in custom ways and analyze the combination as a whole. Converting all of these models to these intermediate representations is often not supported or straightforward.
Figure 3 – Logistic-regression model using PMML
An XML representation of a logistic-regression model using PMML. Any model that is exported to a valid PMML format can be served using PMML serving.
Finding the best path forward – the best of all worlds
The TruEra AI Quality Platform combines the best of these three worlds. It provides a new path forward that retains
- the universality and reproducibility of the “Model Execution Service” approach above, while
- augmenting it with the richer representation for models from the “Common Intermediate Format” approach that supports deeper model explanations and analytics, and
- achieves model analytics performance equivalent to native integration like the “Single Framework Platforms” approach.
For a generic representation of models in the system, we decided to use the open-source MLFlow model format as a starting point. The MLFlow format is generic enough to support a wide variety of ML frameworks and provides repeatable containerized model execution. This allows us to support all popular ML frameworks spanning multiple programming languages as well as arbitrary custom model representations. Examples include scikit-learn, XGBoost, Spark MLLib, catboost, lightgbm, PMML, DataRobot, MLeap, SAS, R, and external model APIs.
The TruEra AI Quality Platform augments this representation in several ways, including adding input-output schemas to support the interpretation of model inputs and outputs, perform validation checks, and effectively compute model explanations and analytics with the right type of objects. The platform also provides a white-box representation for tree models and deep neural networks that allows access to model internals, not just their inputs and outputs. This enables the TruEra AI Quality Platform to support deeper and more efficient model explanations and analytics. While augmenting the representation, we have ensured that a model saved in TruEra would still be compatible with any serving platform that supports the open-source MLFlow format.
In summary, TruEra model execution meets customer needs for a powerful AI Quality platform. It is:
- reproducible as it is built on an OSS standard, self-contained MLFlow model format;
- universal providing multi-language, multi-framework, and pluggable model execution support;
- performant providing fast and scalable model analytics; and
- an enabler of deep AI Quality evaluation, leveraging a richer representation for models than the standard MLFlow format
Authors: Anupam Upadhyay, Max Reinsel and Anupam Datta