Sigh...in the blink of an eye, AWS now has hundreds of thousands of SKUs, 180 different types of services, 17 different ways to launch a container, and has dozens upon dozens of regions and availability zones. No wonder AWS billing has become so complex to understand. Is there a fix?
In this post, we'll focus on the best cloud cost management platforms and cloud cost optimization tools for Amazon Web Services (AWS), with a focus on the needs of digital native companies (i.e., companies "born in the cloud,” like most tech startups). Either through one platform or a combination of tools, a mature cloud cost management set up for a digital native company has the following advantages for your organization:
Broadly promotes cost awareness.
Minimizes disruption of the developer workflow: For example, if you need to spend developer time tagging resources to get decent reporting, that’s a non-starter. Most digital native organizations we work with have less than 10% tagging coverage for their resources.
Provides insights proactively or as close as possible to the point of a decision, rather than purely retrospective.
Handles the reporting and organization of data for use by the Finance team.
Makes data easy to access with built-in dashboards, coverage reports, and provides key metrics for AWS usage.
Foots to the bills provided by the cloud providers (e.g., AWS, GCP or Azure).
Provides a deeper understanding of COGS through smart allocation or marginal cost accounting.
Doesn't unnecessarily commingle the better left separate responsibilities of Finance and Ops teams, i.e., FinOps: We believe this would be impractical for most startups.
At Taloflow, we know a lot about optimizing AWS bills
We actually built a now-discontinued product in this space. Our team has a significant amount of experience with Cloud Cost Management or Cloud Management Platforms (CMPs) because we built our own platform to tackle this market over a period of nearly two years. Naturally, we studied the competitive landscape, the velocity of feature development, the robustness of features, the quality of support, the viability of various pricing models, and on and on, for all the vendors that we competed with. We also helped our customers solve AWS cost problems ad-hoc and held monthly bill reviews with them. We engaged with cloud cost optimization champions with varied titles, ranging from CFOs to the Head of DevOps to Product Managers and CTOs.
Today, Taloflow is a buying insights platform for the tech stack. We help CTOs and other engineering managers at digital native companies pick the best cloud infrastructure and API products for their specific use case, saving them weeks of soul-sucking analysis and costly mistakes for every big decision they make.
AWS cost management tools we considered
Cloud cost management is no longer our business - we pivoted. One of the main reasons for the pivot is that we found the Cloud Cost Management space to be too saturated or noisy. We had a quality offering with Tim (our now discontinued cloud cost management service) but struggled to get new customers at a high enough clip to justify venture-scale.
Now that we’re on the other side of the table, helping prospective buyers of Cloud Cost Management Platforms vet tools for their use case, the first thing we can say is that there are easily 50+ seemingly good options out there. That is, until you dig deeper. We culled down the list to 16 for our analysis using a slightly subjective measure that we believe reflects some degree of market acceptance signals for the products: how frequently these products came up in our sales process or as comparisons in VC pitches.
Author's note: We have commercial relationships with some vendors mentioned in this post. We do not accept payment for inclusion.
Alas, we didn't stop there! In addition to the products above, we looked at specialized tools that might not fit the mould of a "Cloud Cost Management Platform'', but can be used in conjunction with one to have a more dramatic impact on cloud costs. We're talking about things like compute optimization using Spot or Rightsizing strategies, Cost Governance Automation and other related tooling. The great news is that a lot of these are open source or open core. The map below includes: Usage.ai, ProsperOps, InfraCost, Xosphere, NetApp Spot, Cast AI, Env0, GorillaStack, Cloud Custodian.
Before we dive deeper, it’s worth breaking down that there are helpful sub groups we can use to organize our thinking in the Cloud Cost Management category:
Enterprise "Cloud Center of Excellence"
Options like VMware CloudHealth and Apptio Cloudability are mostly fit for enterprises looking to build a “Cloud Center of Excellence”. CloudHealth and Cloudability are purveyors of the values of Cloud Financial Management and FinOps, respectively.
AWS Cost Optimization & Monitoring
Options like CloudForecast.io are more lightweight and turnkey ways to get some simple cost monitoring in place and slick almost-no-setup reports to your email or a Slack channel. These track your AWS cloud usage data and notify you of any spending thresholds that are crossed. At a high-level, these tools help with cost efficiency, managing your AWS usage, and tracking monthly costs.
Options like CloudZero and Yotascale are focussed on the needs of digital native companies for whom cloud spend is a critical piece of COGS and where decision-making is closer to the engineering side. Cloudthread (YC S21) is newer on the scene, having just launched publicly in July 2021. However, like CloudZero and Yotascale, Cloudthread puts more emphasis on tracking unit costs, which we like.
Kubecost is the original mover in the Kubernetes-focussed side of cloud cost management, but both Yotascale and CloudZero have developed strong features in this area to compare to, whereas Harness's cost management is packaged within its broader ops platform.
As far as native options go, there's AWS Cost Explorer for tracking costs and alerting and AWS Trusted Advisor for finding waste. If you like creating pivot tables in Excel or Google Sheets, you can get a great amount of detail from AWS Cost and Usage Reports (frequently abbreviated as AWS CUR).
Quick note: The AWS Cost and Usage Report has the data that most cloud cost management tools need from you to get started. Depending on the complexity, interrelationships between services, and size of your cloud expenses, AWS services & utilization, these can easily be 100s of thousands if not millions of rows long. They are also rife with inconsistencies and can be a pain to ingest. While more mature companies build their own ingestion pipeline to analyze these reports in custom tools, we don't recommend this approach for startups and digital natives.
We looked at 4 different examples of tooling strategies that are relevant to digital native companies. These are all primarily inspired by real companies and real use cases they have. From smallest to largest in terms of cloud cost footprint, they are:
Digital pharmacy startup built natively on AWS
SaaS security platform built on Kubernetes (Amazon EKS)
SaaS analytics platform built mostly serverless on AWS
Usage-based SaaS unicorn spanning several cloud providers
Company A: Digital pharmacy startup built natively on AWS
The digital pharmacy startup built natively on AWS (Company A) is still on AWS credits for a couple more months. At this time, the wise thing to do is to start building cost awareness in the team ASAP before cost overruns make a real impact on the startup's burn rate. Not after credits run out. With a projected monthly AWS bill between $5,000 and $7,000 a month and a team of half a dozen engineers, there is not going to be any dedicated team member for managing cloud costs, but there is clearly a need for more visibility. Ideally, the new tooling cannot add overhead that would noticeably hurt development speed. As such, we will prioritize solutions that have a lightweight setup and are simple enough to easily fold into a small team’s workflow without much training of the core concepts. There is also a need for specialized tooling to optimize (not monitor) the bill ASAP to maximize runway. Leveraging AWS Spot seems to be possible given the product has 7 auto-scaling groups behind it. While assistance buying Reserved Instances (RIs) and Savings Plans is a nice-to-have, this takes a definite back seat to using AWS Spot instances. When done correctly, AWS Spot instances offer greater potential savings.
Daily cost visibility reports delivered via email or Slack
Anomaly detection so the team can focus on what's important with their limited time
Focus on AWS native resources
Spot management to cut costs as much as possible (greater potential savings than reserving capacity or AWS Savings Plans)
No Reserved Instance or Savings Plans management solution to the mix until use of AWS Spot is maximized. Spot has greater potential savings.
TLDR; Get going with Xosphere's Instance Orchestrator or NetApp Spot ASAP to reduce costs using AWS Spot, then pick between CloudZero and CloudForecast.io for the core cost management tool.
CloudForecast.io serves today’s needs in the simplest way while being completely free to use at the lowest tier (i.e.: a free daily cost report in your inbox). It’s the default recommendation for pretty small startup teams.
CloudZero is most likely to be the better solution to “grow into” over time, given it has much more feature completeness in things that may matter later (Kubernetes support, marginal cost analysis, etc.) while being relatively easy to set up and understand compared to the other robust Cloud Cost Management products out there. Notably, CloudZero does not offer support for managing Reserved Instances and Savings Plans, although this is not an immediate concern for Company A given the emphasis on AWS Spot instances.
As for the Spot question, we think Xosphere Instance Orchestrator has an edge over NetApp Spot Elastigroup given Company A has limited time to dedicate to setting up AWS Spot. With Xosphere, Company A can get up and running on Spot fastest by simply applying tags to its instances rather than undergoing a lengthy and complex conversion from the AWS tooling to the NetApp Spot tooling, which NetApp Spot requires along with remote access to the AWS account. Both tools are “set and forget” and also priced similarly in a sort of performance-based model (~15%), taking either a percentage of the AWS on-demand costs or a direct percentage of savings. Based on a review of Company A's AWS Cost and Usage Report (CUR) data, we think it's possible to save about 70% of the EC2 instances portion of the bill by using either tool.
Company B: SaaS security platform built on Kubernetes (Amazon EKS)
The SaaS security platform built on Kubernetes (Company B) is moving towards a nearly 100% Amazon EKS setup wherever possible. Amazon EKS is also 90% of the AWS bill for Company B. However, there is some DynamoDB usage that can be spiky, so it’s worth having some tracking in place for the sphere of AWS native services as well. Company B has had very stable costs for over a year, consistently ranging between $9,000 and $11,000 a month billed on AWS. However, Company B is also planning to scale up its customer acquisition significantly in the coming months. We noticed that adding customers for Company B has easily measurable impacts on costs - it operates very much on the margin. The CTO and an ops manager both keep an eye on the bill (they are the only ones that do) and want to keep it that way. Finally, Company B is looking to invest in a new data warehouse like Snowflake but has concerns about managing costs there too.
Period-over-period trend detection (not true anomaly detection) is all that's needed to for the monitoring and alerting piece
Daily email alerts and summaries
Robust Kubernetes cost monitoring given that almost all compute will run on Kubernetes soon
Some ability to drill into non-Kubernetes related services given the relative spikiness of services like DynamoDB
Some manual assistance buying RIs and Savings Plans about once a year
Support for tracking spend on the non-Big 3 clouds is a nice-to-have
TLDR; Demo with both Kubecost and CloudZero.
Kubecost has for several years developed a robust Kubernetes cost monitoring solution. It's in the name ("Kube") and they are largely the pioneers for this facet of Cloud Cost Management! However, we're still figuring how well their numbers can foot to the portion of the bill for AWS native services.
CloudZero is capable of monitoring Kubernetes and AWS-native services equally well. It's probably the better “umbrella” solution to address Company B's concerns about non-EKS workloads. A major plus is that CloudZero has the only turnkey solution for Snowflake cost monitoring. If Company B chooses Snowflake for its data warehouse, this could be a great match.
Bonus: Company B should keep an eye on CloudForecast.io's Kubernetes cost monitoring release if there is more time to make a buying decision.
Company C: SaaS analytics platform built mostly serverless on AWS
The SaaS analytics platform built mostly serverless on AWS (Company C) has a wicked smart software engineering team. They have one of the best architectures to stay as close to the margin as possible, at least for when things go according to plan with deployments. The team is used to spending hours upon hours pivoting data from AWS Cost and Usage reports to determine if and when issues occur (+ they love Grafana and have been looking to wedge cloud cost reporting into it). However, this strategy is not working for Company C, at all! Their scale-to-zero infrastructure also can go the other way, and do so quickly. Without proper anomaly detection in place, the team is doomed to play cloud cost roulette with its non-stop software releases and its rapid growth. Oh, and the team is a tad price sensitive, particularly when it comes to paying the "cloud tax", i.e., the flat percentage of the bill most cloud cost management companies have built into their revenue model. This tax is usually 2-3% of your overall AWS cloud spend.
Cost metrics that are as real-time as possible given the high potential for cost swings
Focus on AWS native
Assistance keeping an internal review cadence for cost anomaly post-mortems
Suitable for a small engineering team
Cheap and no cloud tax
Reserved Instance and Savings Plans purchasing help once in a while
Answer this: "Do I have to use yet another dashboard?"
TLDR; There really is no perfect match here. The closest paths would be to (1) go with CloudForecast.io, or (2) combine CloudZero with Usage.ai for Savings Plans and Reserved Instance management. We also recommend booking a demo with Cloudthread (YC S21) to see what might be coming around the corner.
CloudForecast.io will be inexpensive compared to the rest of the field, has good Savings Plans and Reserved Instance planning tools, and has a small yet accessible team that can help with post-mortems for significant cost events. CloudForecast.io is steering away from becoming too feature-rich, cumbersome, and enterprise-minded, which may align well with the needs of a smaller startup. The real shortcoming that CloudForecast.io has in Company C’s case is its lack of anomaly detection. Close, but no cigar.
Yotascale or CloudZero may be intriguing options because of their robust anomaly detection features, however given Company C is cost-sensitive and that the breadth of these two tools may be overkill for Company C. CloudZero also does not offer Reserved Instance and Savings Plans management features in its core product, so Company C would have to add another vendor relationship to tackle this aspect if CloudZero is considered in the future. Luckily, RI and Savings Plans management “is sold separately” with really good offerings that automate most of the work, like Usage.ai. While two vendor relationships are not ideal, working with Usage.ai is purely performance based as they monetize via a 20% of ongoing savings. This approach will be worth noting down for when Company C has more sprawl and distributing specific views to different engineering teams becomes a life-saver in the cloud cost control effort.
Reserved.ai is transitioning away from being a solely RI and Savings Plans focussed tool and has decent anomaly detection. However, this transitory phase means that the product itself may be in flux - we’re still learning more.
Cloudthread is worth keeping an eye on given its emphasis on integrating with existing monitoring solutions like Grafana and DataDog and its rapid execution of a unit cost-based monitoring vision, which aligns well with the decisions Company C needs to make.
Company D: Usage-based SaaS unicorn spanning several cloud providers
The usage-based SaaS unicorn spanning several cloud providers (Company D) is at the point where getting a deep understanding of product and customer mix on cloud costs is a must. Steamrolling towards an exit or IPO means you’re going to get scrutinized for this stuff...at least in normal times. Unfortunately, not many providers have anything close to an out-of-the-box solution for this. Besides this, the company's team is nearly doubling in size year-over-year and has hundreds of engineers. How in the world is it going to meet these needs while hopefully staying clear of the enterprise-oriented platforms like CloudHealth? Oh, it’s worth noting that the finance team deeply cares about cloud costs but has the right attitude to empower developers in managing this process.
Author's note: Friendly reminder that this blog post is focussed on the needs of digital native companies, not enterprise customers.
Granular insights with great cost allocation across teams
Daily visibility on costs
Flexible and digestible "plain English" reports for the finance team
Kubernetes insights at the pod and cluster level would be great because 99% of EC2 is running on Kubernetes
RIs and Savings Plans management is critical
Marginal cost analysis to provide a set of objective measures the team wants to use to track progress
TLDR; There's no silver bullet at this stage of rapid growth and with these requirements. Book a demo with Yotascale and CloudZero. Do not go for CloudHealth or Cloudability yet.
Yotascale covers nearly all of the objectives for the use case, including actionable insights around purchasing Savings Plans and Reserved Instances and robust enough Kubernetes and container support. However, it does not address the marginal cost analysis requirements from the finance team.
CloudZero we believe has better Kubernetes insights and combined with a provider like Usage.ai to cover the gap in Savings Plans and RI management, you can get close to meeting the strategic requirements listed above. The biggest differentiator that puts CloudZero at an advantage for Company D’s longer-term needs (rather than “the here and now”) is its fully-fledged deconstructive approach to monitoring and reporting on marginal cost and average cost changes throughout the stack. Its platform can also do this with minimal tagging in place, which is great because Company D has less than 10% tagging coverage to start with. The marginal cost analysis part may help Company D accomplish not only the goal of optimizing cloud costs, but also the goal of aligning other aspects of its business (i.e., product and customer mix) towards greater profitability.
In short, tracking your AWS costs doesn’t have to be complicated, but it really is without dedicated tooling in place. Consider using the above tools to help improve cost management for your AWS infrastructure. Combined with general best practices on keeping costs manageable, your finance team should end up thanking you...eventually.
Need help with choosing an AWS cost management tool? At Taloflow, we deliver tailored buying insights for dev tools and cloud infrastructure. Whether you’re looking for help with validating a use case fit or need product recommendations, get started with Taloflow by clicking the banner below.