Why do AI agents matter for infrastructure demand?

Agents can turn one user request into many model calls, retrieval steps, tool actions, validation passes, and retries. That makes the unit of demand a completed job rather than a single prompt.

Does cheaper inference make agent startups easier to build?

It lowers experimentation and serving costs, but it also raises the bar for differentiation. Products that only resell model calls will face margin pressure, while products with workflow ownership, data access, and reliability will be more defensible.

What should teams measure before increasing agent autonomy?

Measure completed task accuracy, human rework, exception rate, latency, cost per completed job, and rollback frequency. Token volume and answer quality scores are not enough.

Where should enterprises start with agents?

Start with bounded workflows that have clear inputs, repeatable artifacts, available evidence, and low-risk approval paths. Research packets, ticket enrichment, invoice exceptions, and code maintenance are stronger starting points than open-ended decision-making.

AI agents are the missing demand case for the compute buildout

The simplest way to read the current AI infrastructure boom is not that chatbots got popular. It is that hyperscalers, chip suppliers, power companies, and data center developers are making a large bet on a different software shape: many agents, running for longer than a user session, calling tools, searching private data, checking work, and retrying failed steps. A recent Goldman Sachs analysis frames the tension clearly. Model inference is getting cheaper at a violent pace, while the largest technology companies are still committing hundreds of billions of dollars to AI capex. For builders, the useful question is not whether the spend is too high. It is what kinds of agent products can actually turn cheap tokens into durable compute demand.

Key Takeaways

Chat-style usage is unlikely to absorb the infrastructure now being built if model serving costs keep falling.
Agents create new demand because they multiply the amount of work behind each user intent: planning, retrieval, tool calls, execution, validation, and retries.
The builder opportunity is not simply to wrap a model in an agent loop. It is to find workflows where longer autonomous execution produces measurable economic value.
The main risk is runaway background work. Agent infrastructure needs budgets, cancellation, observability, and evaluation before it needs more autonomy.

The infrastructure question is really a product question

Marked evidence packet representing validation and provenance in agent runs

Goldman Sachs points to a striking mismatch. On one side, Google, Microsoft, Amazon, and Meta are expected to invest a combined $315 billion in 2025 capex, much of it tied to AI infrastructure. Semiconductor revenue is also projected at a record scale. On the other side, the price of running comparable AI capabilities has dropped rapidly since the first public wave of ChatGPT. The source uses examples such as GPT-4 pricing falling from $60 per million tokens to much cheaper comparable offerings, and reasoning model prices compressing after DeepSeek R1 and OpenAI o3 Mini. If every app remains a short chat box, the supply story looks fragile. If every knowledge worker starts delegating thousands of bounded tasks to agents, it looks different.

Agents matter to infrastructure because they change the unit of demand from a prompt to a job.

That unit shift is the builder story. A chat request might be a few thousand tokens plus a generated answer. An agentic job can include decomposition, multiple model calls, retrieval over enterprise stores, browser or API actions, code execution, document generation, checks against policy, human approval, and a final write-back into a system of record. The same user intent can easily become 10, 100, or 1,000 model interactions. This is why agents are central to the compute debate. They are not automatically smarter than chat. They are more operationally expensive because they keep working after the first answer, and sometimes after the user has stopped watching.

Source Card

A generational infrastructure buildout might hinge on AI agents

The piece is useful as a market signal, not as a product roadmap. It argues that falling model costs weaken the case for chat-only demand, and that compute-intensive use cases such as reasoning models, agentic systems, and robotics are needed to justify the AI infrastructure buildout. Agent builders should read it as a warning: infrastructure abundance will not rescue weak workflows, but strong autonomous workflows could become the demand engine.

Goldman Sachs

What changed for agent builders

Brass machine part representing budget controls and failure limits for agent infrastructure

Signal	Why it matters
Token prices keep falling	Cheap inference expands what agents can try, but it also compresses margins for products that only resell model calls.
Capex keeps rising	Cloud capacity may become more available, but buyers will still ask whether autonomous runs create real savings, revenue, or risk reduction.
Reasoning models are improving	More capable planning can reduce hand-built orchestration, but it can also hide errors inside longer chains of plausible actions.
Enterprise data remains fragmented	The agent bottleneck shifts from model access to permissions, retrieval quality, data contracts, and integration with systems of record.
Background work is becoming normal	Teams need queues, budgets, checkpoints, and audit trails because agent jobs will not always fit inside interactive request-response patterns.

The biggest mistake is to treat lower token cost as permission to make agents sloppy. Cheaper retries can mask bad planning, weak retrieval, and vague success criteria during a demo. At scale, those defects show up as latency, cloud bills, user distrust, and support escalations. Engineers should model agent cost per completed job, not cost per token. A procurement analyst does not care that a vendor comparison used 200,000 cheap tokens if the output missed a contract clause. A security team does not care that a remediation agent ran overnight if it changed the wrong configuration. Autonomy has to be measured against the business artifact it produces.

Builder note

Design every serious agent with a run budget before you design its personality. Set maximum tool calls, maximum wall-clock time, maximum spend, allowed data domains, cancellation paths, and escalation triggers. Then log the run as a durable object: plan, inputs, retrieved evidence, tool calls, intermediate decisions, approvals, final output, and post-run evaluation. This turns an agent from a magic session into an inspectable production process.

Where agentic demand is most credible

Research and analysis workflows where completeness matters, such as market scans, legal precedent review, scientific literature triage, diligence packets, and competitive monitoring. These tasks justify long runs because humans already spend hours gathering, comparing, and summarizing evidence.
Software maintenance tasks that involve many small edits across repositories, tests, dependency updates, migration checks, and documentation changes. The agent value is not one clever patch, it is persistent execution through a boring backlog.
Operations workflows with structured tools, such as invoice exception handling, CRM cleanup, support ticket enrichment, insurance intake, security alert triage, and compliance evidence collection. These are attractive because actions can be bounded and verified.
Simulation-heavy or search-heavy domains where agents can explore alternatives, score candidates, and refine outputs. Examples include experiment planning, supply chain scenarios, financial modeling, and design space exploration.

The common pattern is not maximum intelligence. It is enough structure for the agent to know when it is done. Agents work best when the environment has clear artifacts, known tools, permission boundaries, and external checks. They struggle when a task depends on ambiguous taste, hidden organizational context, or political judgment. Builders should resist the temptation to sell agents as digital employees. The more useful framing is autonomous job execution with human-defined constraints. That framing makes product requirements clearer: what is the input, what artifact is produced, what evidence supports it, who approves it, where is it written back, and how do we know it improved over the baseline?

Failure modes that will decide who gets adopted

Runaway loops: the agent keeps searching, retrying, or debating itself because no completion condition exists.
Evidence laundering: weak retrieval gets transformed into confident prose, making the final answer look better sourced than it is.
Tool overreach: an agent uses a valid credential for an action the user did not intend, especially in admin, finance, or production environments.
Silent degradation: a model or retrieval index changes, and the agent still completes runs while quality drops.
Queue congestion: background jobs pile up, making urgent work slower and turning cheap inference into expensive operations debt.
Evaluation theater: teams measure answer style or benchmark scores instead of completed task accuracy, exception rate, and human rework.

These failure modes are also buying criteria. Enterprise customers will ask whether agents can be observed, constrained, and rolled back. Founders should expect the winning infrastructure layer to look less like a chatbot SDK and more like a workflow runtime: queues, state machines, policy engines, secret handling, evaluation harnesses, replay, provenance, and cost accounting. That does not mean every startup must build all of it from scratch. It does mean the agent product has to own the user-visible reliability contract. If a model provider drops prices by another factor of 10, customers will not reward an app whose autonomous runs are untrustworthy. They will reward the app that uses cheaper inference to complete more jobs with fewer exceptions.

Adoption guidance

Start with human-in-the-loop agents in high-friction workflows where the current process is measurable. Capture baseline time, error rate, throughput, and rework. Ship the agent first as a drafter or investigator, then let it execute low-risk actions after it proves reliability. Add autonomy by action class, not by marketing tier. A good sequence is read-only research, draft artifact, suggested update, approved write-back, limited automatic write-back, then broader execution.

What is still uncertain

The Goldman Sachs signal is right to separate chat demand from agentic demand, but several unknowns remain. First, no one knows how much agent work users actually want to delegate once novelty fades and accountability becomes real. Second, agent efficiency may improve as quickly as model efficiency, reducing compute intensity per job. Better planning, caching, distillation, smaller specialist models, and tighter retrieval could make tomorrow's agent runs far cheaper than today's prototypes. Third, regulation and enterprise risk controls may slow adoption in exactly the domains where agents could consume the most compute. The infrastructure buildout may be rational, premature, or both.

For builders, the practical conclusion is balanced. Do not assume infinite compute demand will make any agent business work. Also do not dismiss the buildout as hype. The next durable software category may come from agents that turn abundant inference into completed work across messy systems. The winners will make long-running autonomy boring: budgeted, observable, reversible, and tied to business outcomes. If the industry is building the power grid for agents, the missing layer is not another chat surface. It is the operating discipline that lets customers trust machines with actual work.

Goldman Sachs, A generational infrastructure buildout might hinge on AI agents, Apr. 4, 2025, https://www.goldmansachs.com/insights/articles/a-generational-infrastructure-buildout-might-hinge-on-ai-agents

Stay in the know