Cloud, the real cost
Learn the basis of the cloud bill shock, the paradox of the cloud, and cloud cost optimization.I’ve already written two posts explaining cloud computing and its benefits. So, now it’s time to address the elephant in the room, cost. Is it worth externalizing your operations to a cloud provider?
Billing
Usually, Cloud providers charge you by usage in your account monthly. They tend to offer different tools for making an initial estimation of cost and then for monitoring the expenses. You can even set up alarms when you reach a certain level of expense.
Some factors they usually consider in the billing process are:
Factor | Description |
---|---|
Compute Power | Refers to CPU and memory; more power means higher cost; Azure allows elastic scaling. |
Storage | Amount: Costs based on the data quantity you want to store. Hardware: Choices in hardware type affecting speed and cost; options for long-term storage or low latency. |
Bandwidth | Billing is separate for ingress (incoming requests) and egress (outgoing data). |
Per Use | Billing based on service usage, request count, or configured entities like user accounts. |
Per Service | Some services charge a flat monthly fee. |
Region | Prices vary by the data center's location. |
Check more on Azure Developer Billing
In some cases, they have free tier services or services that you can use a number of times before being charged. It’s also a common practice to offer some kind of credit when you first register or based on your personal situation, for example, if you’re a student.
GPT4
What are cloud credits?
Cloud credits are a form of currency that can be used to pay for services provided by cloud providers. They are typically given to customers as part of promotional offers, partnerships, or as part of certain subscription packages.
Cloud credits can be used to try out, experiment with, or fully use any of the cloud services that the provider offers.This includes services related to computing power, database storage, content delivery, machine learning, data analytics, and more.
As a personal recommendation. Review the different discounts, credits, and free tiers beforehand, or at least every time you plan to use a new service, which could help you to avoid "cloud bill shock".
GPT4
What is The Cloud Bill Shock?
"Cloud Bill Shock" is a term that is used to describe the unexpected or surprisingly high costs that a company or individual incurs from cloud services. This can occur due to a number of reasons:
- Lack of cost management and visibility: Without proper monitoring and management tools in place, it can be hard to keep track of how much is being spent on cloud services until the bill arrives.
- Over-provisioning: Sometimes, organizations may overestimate their cloud needs and end up paying for resources they don't actually use.
- Complex pricing models: Cloud service pricing can be quite complex, with costs based on many different factors such as data transfer, storage, processing power, and more.
- Unexpected traffic or usage spikes: If a cloud application suddenly gets a lot more traffic than usual, this can cause costs to skyrocket. This is particularly common for businesses whose website or application goes viral.
Expenses
But how bad could it be? To answer that question, let's take a look:
Software Developers
This scenario is probably the trickiest because, as an individual, the bill will directly affect your personal finances, which vary from different people. There are scary stories like these:
Chris Short, for example, uses AWS for his Content Delivery Network (CDN) to scale his website for about $23.00 per month. After sharing a 13.7GB file that became unexpectedly popular, he awakened to a bill of $2,657.68 Tilaa Blog
Imagine you’ve been running a hobby project in the cloud for the last 6 months. Every month you paid 20 cents. Not enough to really care about. However one morning you notice a surprisingly large transaction of $2700. bahr.dev
Small and Medium Companies
I've collected here some experiences in different cloud providers:
I started my Monday morning with the usual routine: having a quick nosey through the analytics and logs. Naturally, I tend to start with production, but (thankfully!) it was when I got to the stats for our development environment that I was filled with dread. tdwright blog - Azure
A few years ago, I wrote a recursive lambda function that called itself over and over. Within a hours, we had rung up an excess of $600 dollars from the single function call. thenable - AWS
The team panicked, trying to figure out why the bill was so large, but as they started to damage the situation, the bill updated to $15,000. As the day progressed, the bill reached a final value of $72,000, all for two hours of cloud computing time. electropages - GCP
But this final story is very interesting:
Likely less between 1–3%. With some of the emerging micro architectures compute costs are near insignificant. Employee costs, sales and marketing are my largest costs, and relative to cloud infrastructure costs are negligible - like a rounding error! quora
Large Companies
I've identified a few public companies. Identifying cloud expenses is difficult because they usually don't separate them from their income statement. So, I've tried to rely on the news and available information.
Company | Cloud Provider | Year | Annual Expense | Revenue | % | Source |
---|---|---|---|---|---|---|
Spotify | GCP | 2018 | 150m | 5.25b | 2.85 | Cnbc / Business of apps |
Lyft | AWS | 2019 | 80m | 3.61b | 2.21 | Cnbc / Business of apps |
AWS | 2019 | 125m | 1.14b | 10.9 | Cnbc / Statista | |
Snap | GCP | 2019 | 400m | 1.7b | 23.5 | Silicon / Business of apps |
Bytedance | GCP | 2019 | 800m | 17.15b | 4.57 | Data Center Dynamics / Statista |
Airbnb | AWS | 2020 | 150m | 3.38b | 4.43 | Cnbc / Statista |
Robinhood | AWS | 2020 | 60m | 0.95b | 6.31 | Qastor / Business of apps |
Netflix | AWS | 2021 | 1b | 29.69b | 3.36 | Medium / Statista |
GCP | 2022 | 300m | 4.4b | 6.81 | Data Center Dynamics / Business of apps |
Percentage expenses over revenue:
- Average: 7.21%
- Median: 4.57%
I don't include OpenAI on the previous table because the data are projections, but it's worth mentioning:
Company | Cloud Provider | Year | Annual Expense | Revenue | % | Source |
---|---|---|---|---|---|---|
OpenAI - ChatGPT | Azure | 2023 | 255m | 200m | 127.5 | Business Insider / Business of apps |
Saving money
The paradox of cloud
You’re crazy if you don’t start in the cloud; you’re crazy if you stay on it. a16z
In the post from a16z where the paradox of the cloud is explained, they mention the story of dropbox:
When the company embarked on its infrastructure optimization initiative in 2016, they saved nearly $75M over two years by shifting the majority of their workloads from public cloud to “lower cost, custom-built infrastructure in co-location facilities” directly leased and operated by Dropbox.
Well, I must admit that those numbers are impressive, and I advocate cloud cost optimization (see more later). But what is clear is that infrastructure costs have to be taken seriously.
Cloud Cost Optimization
GPT4
What is Cloud Cost Optimization?
Cloud Cost Optimization is the process of controlling and reducing cloud spend by identifying mismanaged resources, eliminating waste, reserving capacity for higher discounts, and aligning with the best pricing models.
While researching Cloud Cost Optimization, I found a Youtube video from Google Cloud where they go through some strategies and even provide a matrix of potential optimizations, effort, and savings. You can see it here:
Following the idea from Google's Diagram, I've tried to make a summary table with strategies you could consider:
Area | Strategy | Description | Effort | Savings |
---|---|---|---|---|
Analyzing Costs | Leverage Cloud-Provider Discounts | Use cloud provider discounts/offers | Low | Medium |
Analyzing Costs | Monitor & Analyze Costs | Monitor and cut unnecessary costs | Medium | Medium |
Containerization | Containerized Microservices | Use microservices in containers | High | Medium |
Containerization | Container Orchestration | Manage containers with Kubernetes | High | Medium |
Data Transfer | Optimize Data Transfer Costs | Reduce data transfer costs | High | Medium |
Instance Management | Rightsize Instances | Match instances to workload needs | Low | High |
Instance Management | Utilize Spot Instances | Use spare cloud capacity | Low | Medium |
Instance Management | Implement Auto-Scaling | Scale resources based on demand | Medium | High |
Instance Management | Use Reserved Instances | Commit to longer-term instances | Medium | High |
Serverless Computing | Serverless Data Processing | Use serverless for data processing | Low | Medium |
Serverless Computing | Implement Serverless Architecture | Use serverless computing | Medium | High |
Storage Management | Implement Storage Lifecycle Policies | Automate data storage policies | Low | Medium |
Storage Management | Optimize Cloud Storage | Choose the right storage class | Low | Medium |
Storage Management | Implement Cold Storage | Use cost-effective cold storage | Medium | Medium |
Unused Resources | Delete Unused Resources | Delete redundant resources | Low | Medium |
Conclusion
When I started writing this post, I wasn't sure what to expect. Many engineers are negative about using cloud computing, especially because of the cost.
I have included stories like the one of Dropbox with impressive savings after cloud repatriation. Or you can even appreciate how Snap spends around 25% of its revenue on the Cloud in the Large Companies section table. So, yes, cloud bills can quickly grow exponentially.
Depending on the stage of your company and the financial situation, you might or not have the resources to be allocated to cloud cost optimizations. It may not be your priority, and that's fine. But if you decide to treat infrastructure cost as a first-class citizen KPI at some point during your journey and work into it, you can apply many strategies.
One of Amazon's principles is frugality, and they define it like this:
Accomplish more with less. Constraints breed resourcefulness, self-sufficiency, and invention. There are no extra points for growing headcount, budget size, or fixed expense. Amazon Leadership Principles
I'm bringing this up now because Cloud Cost Optimization could be approached in different ways. You might set up a specific team to monitor cloud activity and propose optimizations and architectural changes. But you could also embed it into your corporate culture as a principle, and every team and employee would be applying it.
In any case, ALWAYS analyze your situation and make the best decisions based on your needs. And remember that you can count on Cloud Architects from different providers to understand and learn how to improve your architecture by choosing the right services.