Cloud operations today are, for the most part, mature. Enterprises have a comfort level with cloud: It has a defined role in an operational sense, and there’s enough support available, through a combination of architectural best practices, community, knowledge, visibility and automation, to optimally run most digital applications and workloads in public, private or hybrid cloud environments.
Moreover, cloud technology has become a key for widespread access to AI. In years past, only a select few private companies would have had access to the high-performance compute capacity required to run generative AI workloads. Cloud is proving to be the great leveller, making this level of compute accessible – and the AI services that use it available – to all who wish to use it.
But it’s coming at a cost. It’s not necessarily a financial one, although that’s a factor in decision-making. The bigger cost is to cloud optimization approaches. Put simply, widespread and intensive AI adoption is starting to push organizations beyond their comfort zones when it comes to cloud configurations. Targeted action is required to get comfortable with cloud again.
Understanding AI characteristics
To understand why established norms in cloud operations are being tested, one must first understand the nature of the AI workloads that cloud is now being asked to drive.
AI workloads are powerful, both in the sense of the value they can bring to enterprises and the amount of compute resources required to run them at scale.
This will only increase as Agentic AI becomes the dominant type of AI encountered in enterprise environments. Agentic AI signifies a tighter integration of AI technology into business processes, with autonomous or semi-autonomous software agents handling key processes or parts of those processes to meet specific goals. These systems can make rapid decisions, manage complex tasks, and adapt to changing conditions, assuming underlying systems perform as expected and required, but we’ll get to that.
What enterprises need to know is that Agentic AI is more interactive than other forms of AI – “talking” constantly to source systems, data repositories, external tools, databases, and APIs, which makes it a more latency-sensitive evolution of artificial intelligence technology. A cloud or connectivity disruption or failure could lead to an agent-led process failing to either kick off or achieve what it’s intended to.
The main thing to understand about AI workloads is they have different characteristics to the workloads used to define cloud operational parameters today. That means past decisions to make a digital application or workload perform optimally in the cloud are not always cross-applicable to AI. Today’s cloud setups are not designed to meet a very different set of requirements, nor were they intended to.
For enterprises, it’s clear that the same effort that went into optimizing cloud setups for a digital context must now be repeated to optimize cloud setups for AI.
The onus is on enterprises to understand and capture the characteristics of their different AI workloads, such that supporting cloud infrastructure can be architected and configured to meet evolving performance needs.
What this will look like in the cloud
For most enterprises, the reality is that AI and the source systems it taps into run in multiple clouds, in multiple data centers, and across a complex network of owned and unowned connectivity links.
Not all AI services will be available in a local region or zone, and that may be an overriding factor in an enterprise’s choice of AI model.
From an operational excellence perspective, enterprises need to determine where the infrastructure underpinning an AI service and the users of that service are based, to understand whether a cloud environment can support those requirements or if changes need to be made.
This includes understanding the extent of the AI’s exposure to “common” infrastructure, such as having a large amount of traffic being funnelled over a single fiber link, or through a single aggregation point, such as a point-of-presence in a high-density data center that has a high concentration of AI service providers present. Such concentration risk and single points of failure may exceed internal risk tolerances, given the increasingly critical role that AI plays.
Enterprises need to understand how every provider or part of their AI service delivery chain operates. How does a provider prioritize traffic at certain transit or hand-off points? Do they perform their own load balancing? How will this impact AI service delivery? The answers to these questions may give enterprises cause to re-architect their cloud setups to diversify traffic routes and improve redundancy options.
Performance efficiency will be impacted by these decisions. A round trip response time of 50ms might be acceptable for a basic generative AI application, such as a user asking a question and expecting a contextual response. But, for a busy Agentic AI system, if every query response takes 50ms, that will quickly add up. Users may experience excessive transaction times, timeouts or other congestion and latency-related issues as a result.
Enterprises can improve performance efficiency by proactively identifying optimisation opportunities for traffic and cloud resource usage.
About the author: Mike Hicks is a Principal Solutions Analyst at Cisco ThousandEyes. He is a recognized expert in network and application performance, with more than 30 years of industry experience supporting large, complex networks and working closely with infrastructure vendors on application profiling and management. He is the author of "Managing Distributed Applications: Troubleshooting in a Heterogeneous Environment" (Prentice Hall 2000) and "Optimising Applications on Cisco Networks."
Edited by
Erik Linask