Going Cloud Native? Get Ahead of Challenges Before They Start

You know that you want to move your applications to the cloud to achieve the always-on, scalable, responsive and agile capabilities your customers expect. But you also know that your developers may not be ready for the challenges in designing, developing, deploying and operating microservices-based applications. You want to be sure your developers have the support they need to be productive. You want them to spend their time realizing the benefits of cloud native computing, not chasing issues and resolving problems.
Microservices, containers and DevOps have given organizations the power to scale and adapt and give organizations the agility they need to be competitive. But these technologies and techniques tend to increase complexity, challenges that manifest through unprecedented volumes of observability data and unpredictable and rapidly increasing observability costs.
Many enterprise developers are not familiar with cloud native observability technologies or with the best practices for using them to achieve the benefits of cloud native infrastructure. Dealing with cloud native complexity places a new burden on developers who are responsible for achieving scalability, reliability and agility. They need a concise, focused dashboard to be as productive as possible.
Citigroup’s Lessons in Moving to Cloud Native
Before joining Intellyx, I lead cloud migration for the Treasury and Trade Solutions (TTS) division at Citigroup. We knew we needed the benefits of cloud native, but we didn’t quite understand the best way to get them.
TTS is the global wholesale-banking division of Citigroup. Our customers were Fortune 1000 corporations, governments and e-commerce companies. One of the most challenging aspects of our business was creating and maintaining the computing infrastructure capacity needed to process and track millions of small international payment requests daily, including major corporation and government payrolls, driver payments and social security payments. And these types of small international payments were rapidly growing in volume as the world moved more and more toward internet commerce.
To handle the ever-growing payment-processing load, we needed the ability to scale up and down, depending on the variations in seasonal workloads, and to pay for what we actually used so we could deliver the services at a competitive price.
Why Transition to a Microservices Environment?
A reliable cloud native environment is a transition requirement. Moving to modern microservices and container-based architecture delivers speed, efficiency, availability and the ability to innovate faster — key competitive advantages, especially in a world in which a new generation of born-in-the-cloud companies are luring away customers hungry for new features, fast transactions and always-on service.
Our colleagues in the consumer bank were especially vulnerable to the disruption of the born-in-the-cloud FinTechs, for example. Customers, especially the younger generation, would judge a bank entirely by the ease of use and capabilities of its mobile app.
To meet these challenging requirements, we knew we had to get it right. Not getting it right meant lost revenue, lost customers, increased costs and being unable to compete effectively.
We tried to follow Netflix’s best practices example: Break up monolithic applications into microservices and organize our developers into small teams responsible for one or more of those microservices. But we often got stuck on infrastructure issues, such as building out and repairing the CI/CD pipeline and triaging misconfigurations, bugs and production issues.
Challenges of Getting Microservices Right
The relationship of microservices to scale-out infrastructure creates one of the biggest challenges in moving to cloud native.
Microservices require a very different type of design and development approach than the typical monolithic architecture. The right observability dashboard can help developers know where they are going, how to quickly find and fix issues, and when they have succeeded at realizing cloud native benefits.
You need to ensure your skilled developers can focus on achieving cloud native benefits without spending a lot of time on tooling and infrastructure issues that can distract them from working on business logic. And you need to ensure you get the most out of them without letting them get too stressed or burned out.
Moving to the DevOps model is a key part of the challenge. And it’s more of a culture change than a technology change, which means more stress. In the DevOps model, for example, there’s no Ops department to call to provision and secure infrastructure components such as databases and app servers.
Instead, cloud native infrastructure provisioning and configuration are performed using APIs, which means developers have to take on those responsibilities and basically do the work that an Ops team would do for them. (Even though they can do so using automated CI/CD pipelines, it’s still an additional responsibility).
Impact on Developer Efficiency If You Get It Wrong
Hiring developers is expensive, so giving them tools that improve productivity is worth the investment. And it’s not just the cost of labor, but also the cost incurred if the code isn’t correct or doesn’t behave correctly — it’s the cost of missed opportunities, reputational cost, incident outage, repair costs and lost customers.
Add to this the cost of developer burnout, frustration and time wasted on side issues arising from infrastructure misconfiguration — anything other than developing and deploying microservices successfully — and you have a recipe for potential disaster.
When something goes wrong in production, developers need tools that give them the visibility they need into the quality of the code and the behavior of the infrastructure at every stage of the software development life cycle so they can resolve issues quickly and get back to working on the code.
Cloud native developers can spend up to a quarter of their time or more on triaging, debugging and resolving incidents and outages. This is obviously not time spent on creating and delivering application code to production.
If you get it right, you will achieve the unique cloud native benefits of auto scale, better resiliency, agility and “always on” customer experience. But if you get it wrong, you will have wasted the largest part of your IT budget and squandered an opportunity to increase revenues and improve customer satisfaction. You may even have ruined or damaged your reputation and lost existing customers.
In other words, it’s worth investing in the tools that will help your developers get it right. Application tracing and monitoring tools designed for traditional enterprise computing do not handle cloud native computing requirements very well. They do not have the ability to handle the significantly larger volume of data generated from microservices deployed in the cloud native environment or provide data specific to a microservice or directly pertinent to resolving an issue or incident with a microservice.
Using Chronosphere with Google Cloud GKE leverages the strengths of these two cloud native pioneers to accelerate the development, deployment and monitoring of quality code, which improves developer productivity and helps you get the most from your scarce cloud native resources.
Conclusion
Modern web and mobile computing require modern tools, a new developer culture and modern engineering best practices. They also require modern observability and support for developer productivity, allowing developers to pay sufficient and proper attention to code quality and cloud resource utilization.
Everything is a race to the market: to release the app, to update the app, to fix any errors or bugs or slow-performing code, to recover from an incident or an outage as quickly as possible. This is the kind of competition in which organizations need all the help they can get. Conversely, they cannot afford to pay the price for poor developer productivity.
Developers can’t be happy and productive without the right developer experience, which means the right tools and support. They want to minimize the amount of time they spend resolving infrastructure issues and maximize the amount of time they spend delivering quality code to production.