TruEra CLI – built for rapidly shifting, automated ML pipelines

Written by: Team TruEra
Category: Data Science

A command line interface (CLI) is the easiest way to interact with high frequency, automated ML pipelines
The TruEra CLI is built on three core principles that facilitate ease of use by humans and machines and fast delivery of features to all platforms and users
The TruEra CLI leverages the Click Python package for composable, easy to understand command line interfaces

The Tru CLI provides Git like command-line options for uploading, manipulating, and examining models and data in TruEra that makes it ideal to use in a variety of AI pipelines. As ML pipelines become more sophisticated, so must the automation running them. As this happens, the TruEra AI Quality platform is ready to integrate into them. The TruEra command line interface (tru CLI for short) exists as a data science tool, but also as an integration tool for CI / CD pipelines which rebuild and evaluate new versions of ML models on a daily or weekly basis. The CLI is built on three principles.

Principle 1: input and output should be as natural as possible to its users, both human and machine.

For human users, the CLI has a standardized grammar (tru [verb] [qualifier] [noun]+) that makes for natural commands. The structure allows similar commands and their help to be grouped together. The command (tru get all –help) gives the objects you can list via the get all subcommand.

On the output side, the CLI is accommodating to both humans and scripts interacting with it. When the least verbose output mode is set, commands can easily be chained together in scripts because the only output is structured json. This means that scripts can read the output and retrieve relevant information from it. These structured outputs are not deep enough to be unreadable to humans, almost never over 1 level. Here is a worked example of exploring a deployment, then a specific project and finally adding a split to it.

Get all projects on the deployment in question, gives a list.

Get metadata about one of the projects, most importantly the input data format / score type.

Get data collections on the project.

For a chosen data collection, get all the splits currently associated with it.

Finally, add a new split to the data collection.

Notice how the same base commands are used for each type of object, and how the output is readable to humans, but also could simply be given to a json parser.

Principle 2: the CLI should share as much code with the Python client as possible.

The Python client is a rapidly developing window into TruEra, and part of the reason the CLI and client have been able to evolve quickly together is because they share almost all their code. This benefits users because features get developed faster, but it also means that the errors and underlying apis match. This decreases the chances of differing semantics, behavior, etc.

This shared client code has 4 layers. The first is the user interface layer – this is the CLI together with the Python client. It takes user inputs and gets them into function calls. The next is the business logic layer – this takes operations the user might want to perform and translates it into one or more API calls. The API itself is called by a client layer below this. This client builds protobuf messages and handles reading local files for upload. Finally, the client calls the communicator to perform API calls. The communicator is essentially a wrapper for all network calls to abstract away the actual logic around requests.The client layer and above does not need to consider how the calls are being made after the communicator is constructed.

There are two implementations of the communicator. One for gRPC and one for http calls. This allows for using gRPC between TruEra microservices and http to be used by external clients in order to share a port with the GUI.

The http communicator implementation communicates via rest calls and calls the correct route for that api. The call itself is routed by kong and translated back to gRPC by an instance of envoy running within TruEra.

The gRPC communicator holds the stub to the service in question.

This “I shaped” structure of the client code has proven to enable code reuse and fast development. The top layer of CLI/Python client is as thin as possible, as is the bottom layer of gRPC vs http communicators. With most of the code in the core of the CLI it can be a single implementation for this full matrix.

Principle 3: The CLI should use external tools and be usable on any platform.

The CLI itself is written in Python to interface well with data science notebooks. It can be packaged as a Python wheel or as a standalone debian package / msi package which can bring all the dependencies along.

These three principles combine to allow users to access TruEra’s capabilities for adhoc analysis, structured daily pipelines, and everything in between. It also allows engineers at TruEra to deliver these features quickly to the python client and CLI and on any platform.

Author: Max Reinsel, Engineer

August 24, 2021

Team TruEra

TruEra provides the first suite of AI Quality solutions that help enterprises analyze machine learning, improve and monitor model quality, and build trust. Powered by enterprise-class Artificial Intelligence (AI) Explainability technology based on six years of research at Carnegie Mellon University, the TruEra platform helps eliminate the black box surrounding widely used AI and ML technologies. This visibility leads to higher quality, explainable models that sustainably achieve measurable business results, address unfair bias, and ensure governance and compliance.