Free, open source software TruLens is the fastest, easiest way to evaluate the performance of LLM applications based on foundation models like GPT
Redwood City, Calif. – May 24, 2023 – TruEra, which provides software to test, debug, and monitor ML models across the full MLOPs lifecycle, today launched TruLens for LLM Applications, the first open source testing software for apps built on Large Language Models (LLMs) like GPT.
LLMs are emerging as a key technology that will power a multitude of apps in the near future – but there are also growing concerns about their use, with prominent news stories about LLM hallucinations, inaccuracies, toxicity, bias, safety, and potential for misuse.
TruLens addresses two major pain points in LLM app development today:
Experiment iteration and champion selection is too slow and painful. The workflow for building LLM applications involves significant experimentation. After developing the first version of an app, developers manually test and review answers; adjust prompts, hyperparameters, and models; and re-test, over and over again, until a satisfactory result is achieved. It is an often challenging process, where the final winner is not necessarily clear.
Existing testing methods are inadequate, resource intensive, and time consuming.
One of the main reasons that experiment iteration is challenging is that existing tools for testing LLM apps are ineffective. Direct human feedback is the most common testing method in use today. While getting direct human feedback is a useful first step, it can be slow and patchy, and difficult to scale. TruLens leverages a new approach it calls feedback functions – a programmatic way of evaluating LLM applications at scale – to enable teams to test, iterate on, and improve their LLM-powered apps quickly.
“TruLens feedback functions score the output of an LLM application by analyzing generated text from an LLM-powered app and metadata,” explained Anupam Datta, Co-founder, President and Chief Scientist at TruEra. “By modeling this relationship, we can then programmatically apply it to scale up model evaluation.”
TruLens for LLMs can help AI developers:
- Improve the efficacy of LLM usage for your application
- Reduce the “toxicity” or potential social harm of LLM results
- Evaluate information retrieval performance
- Flag biased language in application responses
- Understand the dollar cost of their application’s LLM API usage
TruLens provides feedback functions that can evaluate:
- Question answering relevance
- Harmful or toxic language
- User sentiment
- Language mismatch
- Response verbosity
- Fairness and bias
- Or other custom feedback functions created by the user
“LLM-based applications are taking off and will only become more prevalent,” said Datta. “TruLens can help developers build high performing applications and get them to market faster. TruLens does this by validating the effectiveness of the LLM for their application’s use case and mitigating the possible harmful effects that LLMs can have. It fills a hole in the emerging LLMOps tech stack.”
TruLens is free and available for download at trulens.org. For a quick walkthrough of how to get started, go to “Evaluate and Track Your LLM Experiments: Introducing TruLens.”
TruEra helps companies to build and maintain better ML models, faster. TruEra provides an award-winning suite of AI Quality software for testing, debugging, and monitoring ML models across the ML model lifecycle. Powered by enterprise-class Artificial Intelligence (AI) evaluation technology based on over six years of research at Carnegie Mellon University, the TruEra platform helps drive higher-performing models that achieve measurable business results, minimize unfair bias, and ensure governance and compliance. To learn more, visit truera.com.