Beyond Context: Closing the AI Value Gap

I recently heard Ashley Kramer, VP of Enterprise at OpenAI, say something that stuck with me:

‍“Model capabilities aren’t the issue. There’s a huge gap between what models can provide, and the value enterprises are actually extracting.”

That observation crystallized something I’d been thinking about for a while. We now have models with extraordinary intelligence on tap — enough to write entire software systems on their own. Coding agents are so effective they account for roughly half of Anthropic’s agentic tool calls on their API, generating a significant portion of the revenues that some are projecting to grow to $100B this year. So why haven’t those gains transferred to the rest of the business? Why don’t we already have agents that price our products, optimize our supply chains, and allocate our resources as fluently as they write code?

To close the AI value gap, we need to solve two problems simultaneously: agents that are smart enough to make real business decisions, and agents that are cost-effective enough to deploy across the entire enterprise.

Two Problems, One Gap

Making business decisions isn’t the same as writing code. When an agent writes code, we know if it’s working — it passes the tests or it doesn’t. But when an agent recommends a price change, a new hire, or a new market, how do we know it’s right? We can’t run the quarter in different ways and compare.

The reason we can't test a business decision the way we test code is that every decision fans out into an unbounded tree of consequences. Price a product too high and we lose volume; too low and we sacrifice margin. The "right" answer depends on what competitors do, how demand shifts seasonally, what our costs look like next quarter, and dozens of other variables that are themselves uncertain.

Business decisions are hard to test in advance and we can never fully know if we made the right one even in retrospect. We can only run the quarter one way. We don't get to see the parallel universe where we priced differently, hired differently, expanded into a different market. The counterfactual doesn't exist. Business decisions live in this fog. The best we can do is reason as rigorously as possible before we act — which means having the right context to understand the problem, and the right tools to reason through it.

Even when agents are smart enough to help us run aspects of our businesses, the economics of running them at enterprise scale are daunting. Developers are a small fraction of a typical enterprise workforce — less than 5% in most enterprises — and they’re already generating eye-watering token bills. Extend that to thousands of agents supporting every worker in the enterprise, and the cost is not just high; it’s structurally prohibitive.

I spoke recently with Mark Austin, VP of AI at AT&T on how they’re trying to close the gap. On one hand, Mark wants to create business value by factoring the revenue impact of engineering decisions into how AT&T manages its network. On the other, he’s keenly aware that AT&T is already generating 27 billion tokens a day supporting a few hundred agents. At those numbers, the company simply can’t afford to deploy AI more broadly without first dramatically reducing costs. AT&T is not an outlier. It’s a preview of where every large enterprise is heading.

Project out a couple years from now where every worker is supported by a fleet of agents running around the clock. They can’t be rediscovering business logic, re-reading documentation, and reasoning from scratch on every turn. The economics that work for a few hundred developers collapse entirely when we’re trying to serve the whole enterprise.

Some of my friends will point out that the cost of token inference has been dropping every year, and that this will eventually make the economics a non-issue. Those are often the same people standing in line to invest in Anthropic, expecting its revenues to grow 10x per year for the foreseeable future. They can’t have it both ways. Prices are falling, but usage is exploding faster — and that gap widens dramatically once agents move from developers to the whole workforce.

The good news is that the same three things make our agents both smarter and more cost-effective: context, tools, and post-training.

Missing Meaning

Earlier this year, I had the pleasure of sitting down with Jennifer Li and Jason Cui from a16z to talk about some of the things we’ve been working on at RelationalAI to shrink that value gap. Their recent piece makes a strong case for why we and many others believe that "data and analytics agents are essentially useless without the right context."

Ask a data agent: what was profit last quarter in the Northeast for smartphone accessories? It immediately hits ambiguity. How does our business report profit — gross, operating, net? What counts as revenue — ARR, run rate, recognized? How are fiscal quarters structured? What is “the Northeast” in our organization? Which products qualify as smartphone accessories? None of that is the same for every company, and it’s often either not defined anywhere or defined in conflicting ways across many places. Someone who has been with the business for years carries all of this as intuition. An agent starting cold has none of it, and no amount of raw intelligence compensates for missing these definitions.

The context layer solves this by capturing the core concepts and relationships of a business — the logic and constraints that describe how it should make decisions and how it actually does. Without it, the agent must either reconstruct the knowledge it needs from scratch, burning tokens and running up the bill, or worse: hallucinate it. And even when an agent constructs that knowledge correctly, it needs to remember it across sessions (sometimes called in-context learning), make it governable, and keep it auditable so that others can check its work. A well-built context layer handles all of this. Done right, the context layer isn't a supplement to enterprise data — it's the interpretive lens that gives that data meaning. The agent starts already knowing how the fiscal calendar works, how products are categorized, and how profit is calculated. Fewer tokens. Faster answers. Lower cost on every query, across every agent, all day long.

a16z aren’t the only ones making the case for context. Harvard Business Review has been writing about it. Analysts are writing about it. Foundation Capital called context graphs a trillion-dollar opportunity. It’s hard to have a conversation about enterprise AI right now without the topic coming up. And we agree — but context tells agents what the world looks like. It doesn’t tell them what to do about it.

Bicycles for the Mind

Even with perfect context, agents – like humans – have difficulty knowing how to make good decisions. Decision-making has been studied for decades, with most universities dedicating entire departments to operations research and decision science: how to help people and organizations take the best possible action in the face of uncertainty and real-world constraints. These fields have well-established tools that help run the world’s supply chains, financial markets, and telecom networks.

Let’s change the previous question to: how should we price smartphone accessories to maximize profit in the Southeast two quarters from now? The context requirement is almost identical. But now we also need to model how price affects demand and find the optimal trade-off between margin per unit and volume sold. That’s not retrieval, filtering, or aggregation — the kind of work an agent can farm out to a SQL engine. That’s prediction, optimization, and network analysis. The agent needs an entirely different set of data and context aware tools to reason with.

This is where the analogy of “bicycles for the mind” becomes useful. Humans have always used tools to amplify cognitive abilities — to go far beyond our natural capacities in creativity and information processing. The right tool doesn’t just make a task easier; it makes previously impossible things routine. We all agree it’s essential for data agents to have access to a SQL engine. An LLM doing SQL work by hand would be slower, more expensive, and far less reliable. The same logic applies to decision-making. We can’t expect our business agents to make good decisions if we don’t give them the tools of decision intelligence: data and context aware reasoners for prediction, optimization, graph analysis, and rule-based logic⁽¹⁾.

Giving agents the right tools isn’t just about making them smarter. It’s also about making sure they’re not burning tokens on problems the right tool can answer in milliseconds. When a solver handles the optimization and a predictive model handles the forecasting, the LLM isn’t doing that work token by token. Specialized tools are faster, more reliable, and dramatically cheaper for the kinds of reasoning that underpin real business decisions.

The Graduate And The Professional

Context, in-context learning, and tools get us far — but a generalist with tools and a notepad is still a generalist. For real decisions, we need a trained professional who knows our business.

The scaling laws in AI are mathematical formulas that show how LLM intelligence improves predictably as we increase data, compute, and model size. In fact there are three distinct scaling laws.

Pre-training, the first scaling law, is like sending the model to school — it emerges with broad general capabilities. The second, test-time compute, gives it room to reason through harder problems.

The third is the scaling law for post-training. Pre-training gives us a graduate and test-time compute gives them the space to reason; post-training gives them their apprenticeship. A newly minted graduate arrives with impressive general knowledge, but on day one doesn’t know our systems, our terminology, our customers, or how decisions get made. We have to train them. The same is true for AI. We can significantly improve a model’s capabilities by post-training it on our private data and context. Data that it can never see during pre-training. Post-training teaches a capable model how to operate in our business.

Jensen Huang was very clear about this at this year’s GTC keynote: post-training open-weight models and fusing structured data with generative AI are “the foundation of trustworthy AI”. NVIDIA is investing $26 billion on developing powerful open weight models to make that possible. The scale of that investment reflects the scale of the opportunity. Post-training finishes the job that pre-training started. It isn’t a footnote to pre-training; it’s an intelligence multiplier.

Furthermore, MIT research shows that out-of-the-box open-weight models achieve about 90% of the performance of the closed models at about one sixth of the cost. The research concludes that “optimal reallocation of demand from closed to open models could save the global AI economy about $25 billion annually.” The savings are real and can go a long way towards making enterprise-wide AI economically feasible.

Post-trained open-weight models can handle the majority of enterprise use cases at a fraction of frontier costs. The model learns what our organization knows and how to use the right data and tools to help us make sound business decisions; when to call the solver, when to invoke the demand predictor, and when to loop in a frontier model for something genuinely novel. The professional who knows our business doesn’t need thirty turns to figure out our fiscal calendar. They already know it. And like all good professionals, they know exactly which expert to call.

Minding the Gap

The AI value gap is real. We need smarter and more efficient decision agents to close it.

Context gives agents a shared understanding of our business and compresses the cost of every query they run. Tools give them the ability to make better decisions under real-world constraints. Post-training turns open models into business experts: smarter, faster, cheaper, and more reliable than an agent that doesn’t learn our business.

Together, these make it possible for decisions to happen at machine speed with economics that work. That's the shift from traditional analytics — the dashboard that shows us where we've been — to decision intelligence: the navigation system that tells us the best way to get where we want to go.

That’s what we’re building at RelationalAI: a decision intelligence system that extends our customers’ data clouds with the context, tools, and “push-button” post-training needed to move from understanding their data to acting on it.

⁽¹⁾ It’s important that the tools used by our agents be data and context aware. While in theory all you need is an LLM and a programming language like Python, in practice the right abstractions can make a big difference. As our friends at Kumo recently posted “agents are only as good as the abstraction they work with.” Their experience is very consistent with ours, and we’ve seen similar benefits when using a domain centric abstraction with all the tools needed for good decision making

‍

Related reading:

‍