What is the main infrastructure lesson for AI agent builders?

The main lesson is that production agents need governed data access, permission checks, observability, cost controls, and escalation paths. Model quality matters, but it is only one part of the production system.

Why does hybrid cloud matter for AI agents?

Hybrid cloud matters because many agents need cloud-scale model capabilities while sensitive records, identity systems, or operational tools remain in private environments. Builders should plan for split execution and clear data boundaries.

What should teams measure before scaling an agent?

Teams should measure task completion, escalation quality, retrieval accuracy, policy violations, latency, step-level cost, user satisfaction, and error recovery. A high answer rate is not enough if the agent is unsafe or uneconomical.

How should builders reduce agent operating costs?

Use model routing, caching, retrieval before generation, smaller models for simple classification, strict loop limits, structured tool calls, and monitoring at the workflow-step level rather than only at the session level.

AI agents are becoming an infrastructure problem, not a demo problem

The useful signal in Google Cloud's 2025 public sector AI infrastructure report is not that organizations want generative AI. Everyone already knew that. The sharper point for agent builders is that adoption has outrun the easy part. The report says 98% of surveyed organizations are experimenting with, developing, or using gen AI in production, while 70% report difficulties around data governance, integrating data into AI models, or insufficient training data. That gap is where most agent products will either become durable infrastructure or remain expensive prototypes. Agents are not just another UI on top of a model. They are latency-sensitive, permissioned, data-hungry workflows that touch search, records, identity, translation, routing, escalation, audit, and cost management.

Source Card

PDF 2025 public sector report State of AI infrastructure

The report is vendor-published, so it should be read as a market signal rather than neutral doctrine. Still, its survey of more than 500 technology leaders captures the operational reality facing agent builders: broad adoption, strong demand for hybrid cloud, heavy concern about cost, and persistent data readiness problems.

services.google.com

Key Takeaways

Agent adoption is moving from experimentation into infrastructure planning, which means builders must design for security, scale, and cost from the first production use case.
The dominant use cases named in the report, data analysis, customer service automation, and internal process automation, map directly to agent workflows that require reliable retrieval, tool use, and escalation.
Hybrid cloud and edge interest are not just procurement preferences. They shape where agent memory, inference, logging, and sensitive data handling can legally and practically live.
The biggest blocker is not model capability alone. Data governance, data integration, and training data quality are now the main bottlenecks for dependable agents.

Index cards showing agent workflow boundaries and governed data access

The market has moved from if to how

The report frames a change many builders are already seeing in sales calls and internal platform reviews. Leaders are no longer asking whether AI will matter. They are asking how to integrate it with legacy systems, how to secure it, how to scale it, and how to keep cost under control. Google Cloud reports that 79% of technology leaders consider gen AI very or extremely important to current and future operations. It also reports that 64% of organizations are pursuing both internal and external-facing use cases. For agent teams, that matters because internal agents and public-facing agents have different failure costs. A back-office summarization agent can be wrong and trigger human correction. A resident services agent, healthcare navigator, tax assistant, or claims agent can misroute a person, expose private data, or create a record that must be defended later.

Signal	Why it matters
98% of organizations are experimenting with, developing, or using gen AI in production	The competitive question is no longer access to a model. It is whether your agent can survive production constraints better than the next tool.
70% report data governance, model integration, or training data challenges	Retrieval quality, permissions, lineage, and data contracts are core agent infrastructure, not optional plumbing.
74% prefer a hybrid cloud approach for gen AI deployments	Agent architectures need to support split execution, with some data and tools staying in controlled environments.
73% consider edge deployment important	Builders serving field work, low-latency services, or data sovereignty use cases should plan for distributed inference and sync problems.
83% prioritize cost efficiency in gen AI infrastructure	Agent orchestration must include budget-aware routing, caching, batching, and graceful degradation.

What changes when the agent becomes the workflow

A chatbot can answer a question and disappear. An agent that runs a workflow accumulates obligations. It needs to know which system of record is authoritative. It needs to decide whether a tool call is safe. It needs to preserve context across turns without leaking information between users. It needs to hand off to a human with enough evidence to be useful. It needs to behave consistently across web, phone, email, and internal consoles. The source report highlights public sector examples such as resident communication across channels and translation support for multilingual communities. Strip away the branding and the builder lesson is simple: the agent's real value appears when it connects fragmented services, but that is also where risk concentrates.

A marked paper ledger representing agent cost and governance risk

Builder note

If your agent roadmap starts with model selection and ends with a demo, it is upside down. Start with the workflow boundary. Identify which records the agent may read, which actions it may take, which actions require approval, which language or accessibility requirements apply, and what audit trail must exist after every session. Then choose models, retrieval systems, and orchestration layers that fit those constraints. In regulated or public-facing settings, an agent that cannot explain where an answer came from is not production-ready, even if the answer sounds correct.

Choose one high-volume workflow with a measurable service outcome, such as response time, case deflection, form completion, triage accuracy, or staff hours saved.
Map the data path before the conversation path. List every source the agent needs, the freshness requirement, the permission model, and the owner who can approve access.
Separate answer generation from action execution. Let the agent draft, reason, and recommend, but gate irreversible actions through policy checks, deterministic validation, or human approval.
Instrument cost at the step level. Track retrieval calls, tool calls, model tokens, retries, escalations, and failed sessions. Average cost per conversation hides the expensive tail.
Design escalation as a first-class feature. The handoff should include user intent, evidence retrieved, actions already attempted, confidence signals, and policy reasons for escalation.
Test multilingual and accessibility behavior with real tasks, not translated marketing prompts. Public and enterprise agents often fail when translation changes legal, procedural, or domain-specific meaning.

Hybrid and edge are agent architecture decisions

The report's 74% hybrid cloud preference and 73% edge interest should not be dismissed as enterprise conservatism. Agent builders are increasingly asked to operate where data cannot freely move. A city service agent may need cloud-scale language understanding while keeping resident records in a local system. A utility maintenance agent may need to work in the field with intermittent connectivity. A defense, health, finance, or industrial agent may need inference near sensitive systems for latency or sovereignty reasons. These requirements affect more than hosting. They determine where prompts are assembled, where embeddings are stored, how logs are redacted, how evaluations run, and whether a fallback model exists when connectivity drops.

Failure mode: the agent retrieves from a stale replica and gives a confident answer that contradicts the system of record.
Failure mode: permissions are checked at login but not at tool-call time, allowing the agent to summarize or act on data the user should not access.
Failure mode: cost controls are added after launch, so the agent handles simple requests cheaply but burns budget on long, looping sessions.
Failure mode: multilingual support is treated as translation only, while intent detection, policy language, and document references remain optimized for one language.
Failure mode: edge deployments collect logs that cannot be reconciled centrally, leaving teams unable to evaluate drift, abuse, or recurring errors.

The production agent is not the model plus tools. It is the model plus tools plus permissions plus data contracts plus cost limits plus evidence.

Cost efficiency is now a product feature

The report says 83% of organizations prioritize cost efficiency when adopting gen AI infrastructure. That should change how founders package agents and how engineering teams design them. A workflow agent that uses the largest model for every step will be easy to prototype and hard to renew. Production systems need routing policies: smaller models for classification, retrieval before generation where possible, caching for repeated answers, structured outputs for tool calls, and stop conditions for loops. Cost also has a trust dimension. If operators cannot predict usage under peak load, they will cap access, narrow the scope, or keep the agent in pilot status.

The uncertain part is how much of this infrastructure becomes standardized. The report points toward managed platforms, hybrid deployment, and cloud provider services, but agent teams still face fragmented choices across vector stores, gateways, observability tools, policy engines, evaluation harnesses, and model providers. Builders should avoid hard-coding assumptions that every customer will accept the same cloud, same region, same logging policy, or same model. The safer bet is modularity: pluggable model backends, explicit data connectors, portable evaluation sets, and clean separation between orchestration logic and customer-specific policy.

Adoption guidance

For a first production agent, do not chase the broadest assistant. Build the narrowest workflow that forces you to solve the real infrastructure problems: authenticated retrieval, governed tool use, human escalation, measurement, and budget control. If the agent cannot operate reliably on one bounded process, adding more skills will only multiply ambiguity. If it can, you have the foundation for a platform rather than a feature.

Google Cloud, 2025 public sector report State of AI infrastructure, services.google.com, https://services.google.com/fh/files/misc/google_cloud_state_of_ai_infra_public_sector.pdf

Stay in the know