Sentient launches ‘Arena’ to benchmark AI agents in real-world business workflows

Pantera, Franklin Templeton join Sentient Arena to test AI agents

SHARE THIS ARTICLE

Written by

Nazia Saeed

Reviewed by

Raghav Chopra

Updated 13:19 EST

Feb. 27, 2026

Sentient started Arena, a production-style platform for testing AI agents on business tasks. Pantera and Franklin Templeton were the first companies to participate.

Pantera Capital and Franklin Templeton’s digital assets section are now part of the first group of Arena, a new testing environment from open-source AI lab Sentient. Arena is meant to see how well AI agents work in business-style processes.

Sentient said in a Friday release that Arena is a benchmarking platform for production, not a static model test. Instead of just giving agents scores based on predetermined datasets, it gives them standardised tasks that are similar to those they would face in a real business, such as large papers, missing information, and sources that don’t agree.

Sentient launches ‘Arena’ to benchmark AI agents in real-world business workflows

Source: Celonis

Shared testing framework tracks hallucinations, citation errors and logic gaps

Oleg Golev, product lead at Sentient Labs, “In this first phase, participation means supporting the Arena program and developer cohort.”

He noted that partners are helping to define what “production-ready reasoning” means for jobs that involve many documents, like compliance, analysis, and operations. The corporations aren’t saying how much money they are going to put into the project.

Companies are speeding up the use of AI agents in research and operational workflows, even though governance frameworks are still behind.

The Celonis 2026 Process Optimisation Report, which came out on February 4, says that 85% of the top company leaders who were surveyed want their companies to become “agentic enterprises” in three years. Only 19% of them already use multi-agent systems.

Not static grading, but production-style evaluation

Golev said that Arena is a shared platform where developers send AI bots to do standardised tasks and then compare the results using the same testing settings.

The platform keeps track of types of failures like hallucination, missing evidence, wrong citations, and gaps in logic. This lets developers find problems that keep happening.

Arena wants to put out a public scoreboard with performance measurements that compare different players and postmortems that list common failure modes and how to improve them.

OpenRouter and Fireworks are two of the infrastructure partners that are providing inference compute for the first group of people. Other partners are helping with tools and training.

Crypto and payments players expand AI-driven infrastructure

The effort comes about as financial and crypto companies try out giving AI systems more freedom in how they make money.

On Wednesday, MoonPay set up the infrastructure for AI agents to make wallets and handle stablecoin transactions.

Stripe’s leaders said on Thursday that blockchains, which are decentralised digital ledgers used to record transactions, may need a lot of work to make them bigger if AI-driven commerce grows.

Coin Headlines covers the latest news in crypto, blockchain, Web3, and markets, bringing you credible and up-to-date information on all the latest developments from around the world.

We focus on real-time news updates, market movements, whale transfers, and macroeconomic trends to keep you informed and engaged. Whether it’s Bitcoin price swings, altcoin updates, meme coin hype, regulatory changes, or major moves from the world of traditional finance, Coin Headlines gives you what you need to know, right when you need it.

Shared testing framework tracks hallucinations, citation errors and logic gaps

Not static grading, but production-style evaluation

Crypto and payments players expand AI-driven infrastructure

About Coin Headlines