Tech Decisions Episode 2 | Micah Wheat on Cloud Cost Management in the Age of AI

Transcript

Louis-Victor Jadavji: [00:00:00] Hey everyone, welcome to the Tech Decisions Podcast, where we take the soul sucking work of researching enterprise technology and make it accessible, engaging, and insightful. Each episode dives into the vast world of tech solutions, breaking down complex choices into manageable insights to help you make informed, strategic decisions.

Whether you're a seasoned IT professional or curious newcomer, Tech Decisions is your go to resource. Tech Decisions is sponsored and hosted by the team at Taloflow, the leading technology selection platform to evaluate vendors for your exact use case. Today, we'll be discussing the wonderful world of cloud cost management.

But before that, let's do some introductions. Myself and LV, I am the co founder and CEO at Taloflow. I'd like to introduce my co host Abhishek Singh. Abhishek is the manager of enterprise systems integration at Toast. Prior to this, he was a principal analyst at Gartner in the application architecture, Infrastructure and integration group.

I'd like to introduce my co founder at Taloflow, Todd Kesselman. [00:01:00] Abhishek and I asked Todd to join us today because the subject we'll be discussing with our guests is an area where Todd is an expert. And finally, our guest, Micah Wheat. Micah is the co founder of Dashdive, formerly known as Rosettic.

A Y Combinator backed company that he co founded with Adam Shuggar, whom he met while at Stanford. Dashdive is a cloud cost observability tool that integrates seamlessly with major cloud services to break down multi tenant cloud costs by feature. Welcome, everyone. Hey, everybody. Welcome. Take us away.

Abhishek Singh: So Micah, I'm just curious to know how did you and Adam come up with this idea of forming a company?

Micah Wheat: Yeah, it was we took a kind of a circuitous path to getting to this company. We actually like many college students started working on a startup in college out of our Stanford dorm room, we launched a social app called fleet. There's a reason you haven't heard of it. It never really got off the ground.

But that was our first experience, deploying apps [00:02:00] on AWS. And we were, found the billing structure or the billing systems there to be very byzantine and a black box. Then we, after getting into YC, decided to pivot and started brainstorming different ideas for, we were pretty clear we wanted to do SaaS just a better skill set match for us.

And came up with a couple of ideas for developer tools, we're interested in the cost management space, talk to about 50 engineering leaders across, everywhere from startup all the way up to enterprise and kept hearing about the. Problems that stem from this lack of visibility into cloud costs.

If you don't at the startup level, it often for SaaS startups, it has to do with pricing. It's really hard to optimize your pricing. If you don't know what each customer is costing you, or which what types of usage, what types of features. Correlate with costs and how that affects your infrastructure costs.

And then at the enterprise level, it often has to do more with understanding the per team [00:03:00] breakdown. So we talked to 1 director of engineering at a public IOT company who told us that their top priority for the year was to improve gross margin and they really had no idea what levers to pull to.

Meet that goal because they couldn't see how much each business unit was costing the company.

Louis-Victor Jadavji: Cool. So Dashdive's journey has always interested me because at Taloflow, we actually spent two years in the space trying to do something quite similar and of course we eventually pivoted for various reasons.

First and foremost, the market seemed a bit saturated. It wasn't much uptake in terms of, folks were focused on growth on all costs instead of trying to maximize profitability. So what do you think is different now compared

Micah Wheat: to the market 2019 2020?

Louis-Victor Jadavji: Exactly. Yeah. Yeah. What do

Micah Wheat: you think is different now?

So I think there've been two important kind of tectonic shifts in the market. The first is just the macro environment is completely different. Fundraising has gotten much [00:04:00] harder and also much much more uncertain, right? Planning to raise another massive round to feed your just.

Maximal growth isn't necessarily as good a strategy as it was back in 2020. I actually still don't think it was a great strategy back in 2019, 2020, but now for most people, it's just not an option. The second thing is we're seeing lots of SaaS companies in particular adding AI features.

And the, just the business model of AI products is fundamentally different than kind you might call the first wave of SaaS products. That had much higher gross margins, right? And so now, compute costs are going up, right? The public cloud providers are just going gangbusters right now.

And a lot of these what's different about these AI related infrastructure costs is that. They're predominantly variable in nature. So in other words, they're largely a function of customer usage and they may not even be uniformly distributed even on a per, per usage event basis. So [00:05:00] that's, I think why we're seeing a lot of people start to switch to more usage based pricing models to try to incentivize their customers to limit usage.

In a way that makes sense for the company's cost basis. I think it's those 2 things. It's the funding environment and then it's the increasingly adding AI features and wanting to be able to measure the ROI of adding those features because they are often quite costly.

Louis-Victor Jadavji: Yeah, and to double click on that.

So these cost pressures that these companies are facing. I imagine there's a lot of emphasis on a quick fix and I keep seeing all these arbitraging tools pop out like pump and usage and various others. So can you break down what's happening here in terms of this phase where everyone's looking at arbitrage, reserve instances or leverage spot instances?

And that's like the main fix that's top of mind.

Micah Wheat: I'd say there's two levers you can pull if you're trying to optimize your cloud costs. The first is [00:06:00] to basically leverage some of this low hanging fruit around optimizing your reserved instances. And then also there's this kind of these group buying schemes. So there's a ton of tools out there that are pretty good at ingesting your cost or in usage data from across all your providers.

Including things like Snowflake and then go and, all three major cloud providers, whatever it is and then displaying that all in a single plane pane of glass, as they like to say giving a nice dashboard to your pricing team, your finance team and your engineering teams as well.

And the key feature, right? How they, how these businesses or how these tools show positive ROI is that they basically work to automatically. Optimize your reserved instances for to get you a discount basically. So they look at past usage, try to forecast future usage and then, reserve the optimal amount of compute.

For you, the 2nd lever has to do more with [00:07:00] kind of creating a cost culture within the business. So having every engineer, every manager be aware of the cost that they're incurring and then try to justify that in terms of the other. The other piece of that is. Making sure that your pricing makes sense given your cost structure.

So that you're actually incentivizing customers to limit their usage in ways that add value, create value for both businesses.

Louis-Victor Jadavji: Yeah. That's especially relevant with all the costs, the various GPT APIs and so on. Yeah.

Todd Kesselman: So have you found, or what do you think the thinking is on amortizing costs, the sunk costs at the beginning of the AI project, right? We're going to have to do a whole bunch of work and then

Micah Wheat: the marginal costs. Model training costs.

Todd Kesselman: So I'm just curious where do you see the industry on that right now? And how's that being handled vis a vis the pricing models

Micah Wheat: out to the customer?

That's the [00:08:00] question, the big question that there's not a clean solution for, is like, how do we allocate? These fixed costs that we have and it really just depends on the business, right? If some people want to think about oh, we want to allocate these evenly across all customers or some people want to say, oh, we may, maybe we made this model primarily for this customer.

So we want to allocate it all to that customer. There's also the issue of having basically headroom, right? Or over provisioning certain reserve instances. So that you can it's not going to so that you can meet peak loads and spikes and stuff like that. And so then there's, different people think about it differently.

I think. Is the unfortunate answer and we can at Dashdive can work with you to try to think through what, what's going to make the most sense for your use case. So your tool right now is,

Todd Kesselman: oh, go ahead. I'm sorry. Go ahead. No, I insist.

Abhishek Singh: So I had a question, which is mostly around the terminology that you use in your product description, which is cloud cost observability.

So just most of [00:09:00] the industry term is mostly focused on cloud cost management and FinOps. So can you help us explain the differences between these two terminologies as well as to the viewers?

Micah Wheat: For sure. The reason we chose to describe our product is cloud cost observability was primarily to try to elucidate the difference between our product and a lot of these other ones on the market that tend to call themselves more cloud cost management, cloud cost, cost cutting tools.

These are a lot of the ones that we were talking about that take basically just take the data from your. Cloud provider and then help you in the background automatically optimize your reserved instances. That is a very different product from what we offer at dash dive. What we do is we actually collect Additional data that you couldn't get out of AWS or whatever cloud provider you're using and that's the very granular usage event level at this time on this day, this customer invoked this [00:10:00] feature, this API endpoint that used, however much, storage CPU, memory, whatever.

And then that had this cost for you. Currently there's no other tool on the market that can do that. And that's why we use the term observability. To suggest that we're doing something much more akin kind of the nuts and bolts or mechanically our product works much more similarly to what the application performance monitoring industry does.

So something more like a data dog, right? Getting this very granular. user specific data that none of these other cloud cost management tools can do. FinOps specifically. I like the term. The FinOps Foundation does a lot of great work, has put out a lot of great thought leadership on this subject, but it's just not a household name really.

And frankly, I don't find it to be the sexiest marketing around, FinOps, it doesn't sound fun.

Abhishek Singh: Thanks Micah and over to you, Todd.

Todd Kesselman: Oh, so let's get into, a [00:11:00] little bit of the nitty gritty, right? So let's start with your philosophy on attribution and how granular it needs to be in order to actually. Get useful data out.

Micah Wheat: Yeah. Like I said earlier, I think there's value in what some of these cloud cost dashboard cloud cost management tools can do.

But I think it's just it's somewhat limited when you compare that to the having ultimate granularity. And the reason for that is that. To a certain extent, if you're just taking this aggregate data, and you don't have the event level data you're forced to make some pretty big assumptions that can lead you astray.

Specifically, you have to take these. Estimates based on average total costs, right? So we have, end customers using database a we're going to divide. The cost on database a divided by N at the end of the month and create an estimate [00:12:00] for how much each customer cost us particularly with due diligences, M and a, we see companies trying to estimate their gross margin like this and ultimately.

With the costs on database a, for example, you're gonna have internal tools using that and you're gonna have R and D getting done on that database or instance, or, what have you, or you're gonna have marketing or, any number of things that are not a component of cogs. And so then that's going to cause this company in this example to under report their gross margin, which can have, significant implications for the multiple that they're able to get when

Todd Kesselman: we did a bunch of this for AI clients found one of the issues is actually noise in the detail, which was what I was asking.

So there's like a fine line between the noise and the detail.

Micah Wheat: Because the systems,

Todd Kesselman: You get things like the marginal costs from day to day varying. From hour to hour from minute to minute, literally because of other [00:13:00] things that are going on in the system. If you have a shared CPU resource, right?

So how long does it take you to run something? It might be based upon what else is hitting that unit at the same time, which then changes your economics, completely. So yeah, that was what I was asking is how you're handling when we produce marginal costs, if the intervals got too short.

Too much noise, loss of credibility in the data. Too long. You lose information. So just curious how you guys are addressing

Micah Wheat: that. Yeah. Like I said we collect data at the most granular level possible at the level of, individual usage events. And then. It's like you're getting at, it's basically all about how you roll this up to get the insights out and in our dashboard.

We it's configurable. How many basically the time slices and the time series data are configurable. What we start with tends to be, like a. A 12 hour view we find that to be pretty helpful, but that's completely configurable in our dashboard. And [00:14:00] the optimal view is going to depend on the circumstances.

Todd Kesselman: Then so is your dashboard taking into account any correlative impacts or are you following through? If I have an event a, that forces service at B, which then forces service C, which forces service D, what are you guys doing with that? Just curiously.

Micah Wheat: It depends on how they want to define the feature, right?

That's the big part of what we do is breaking down cost by feature. And so if what the customers most interested in is what's the total cost across all services invoked for this 1 feature, we can totally do that and pull up all those costs into 1 1 line in a spreadsheet for them.

And on the other hand, if they, some companies are more interested in what's our cost breakdown by service you can similarly get a. Per feature, per customer, per team, really, whatever you want, break down by, [00:15:00] service by cloud service. It really depends on what the customer is interested in.

And because we collect the most granular data possible, we can really support whatever they want.

Louis-Victor Jadavji: And Todd, do you agree with that approach?

Todd Kesselman: I think ultimately it's like a walk before you run thing. Right now, just getting engineers to think about marginal cost versus average cost and to understand and get it integrated back into the whole product cycle flow and decision making is.

We're not, we're at scale one there in terms of the modern enterprise. So you have to walk, but as we found out that ultimately it's a complex system that's interrelated. And the problem is that you're trying to get marginals, um, on what would be a partial, right? Because everybody wants to look at things.

If I just serve this 1 thing to this customer, what's my cost to it? And [00:16:00] sometimes you can do that because the system and the math allows you to, but sometimes you can't because it's really dependent on all on multiple things. So what we found is at some point. You end up with some estimation.

You can't do it completely deconstructive at some point that you have to add all these, take the data that you have and have somebody who knows how to look at it and correlate it back up in order to get really accurate flows. But again, you got to at least have the data to start with and we found most.

Companies, your tool would be great because most of them just didn't have the data. You're going in and trying to do the estimate these things from the very high level, which is obviously not going to be as accurate and

Louis-Victor Jadavji: you're installing agents, right? Make up to actually track this kind of data.

Yeah,

Micah Wheat: We offer a couple installation options, but yeah, one is an enabled agent sits in the Linux kernel, the customers virtual machines and [00:17:00] basically just Super, super simple install kicks all this granular usage data out to us, or

Todd Kesselman: you can also self host it. What do you guys do to get the dialogue between the marketing areas and the engineering?

Is there any tool set that you provide? Cause that was the other thing that we always found was interesting is yeah. Developers are just not used to thinking this way and get a little nervous when it's like. When somebody starts looking at the cost of

Micah Wheat: their code is radical change from how businesses has been done over the last 20 years, so I, yeah, totally.

Todd Kesselman: do you guys provide any tools to make

Micah Wheat: that easier on them or provide them now that you get into kind of the philosophical, trying to change the culture of the organization we're pretty much just in the business of providing data right now. But we plan to in the future kind of roll out additional features around recommendations and other things like that.

Louis-Victor Jadavji: So on the culture standpoint, so what, it's not many [00:18:00] organizations have a good cost culture for cloud costs. But what would make a good cost culture for those that strive to have one?

Micah Wheat: Yeah, as a. This is maybe more of a philosophical point, but I don't tend to put that much stock in culture.

I tend to think more in terms of incentives. And basically right now, there can be no incentives if none of this stuff is getting if none of this stuff is being none of these costs are being tracked. And there's no, there could be no accountability for costs. If, we're just getting this black box bill from AWS every month, and we have no idea who's responsible for what.

So I think that's the 1st step is really just getting the data, right? It's the whole. Cliche around what gets measured gets managed. I think that's totally true. I was just reading I don't know if you guys have read it, Elon Musk's, the Elon Musk biography that just came out, the Walter Isaacson book.

Fantastic read if you guys haven't read it, but one thing that really struck me about what he did at Tesla and at SpaceX. [00:19:00] Was every step of the way on the assembly line for the, space X's rockets and Tesla's, um, they had to have a basically every component of those products had to have a name attached to it.

This person specifically. Defined this requirement for the product and it wasn't good enough to say oh, legal specified this requirement or whatever engineering specified this requirement. Someone's ass had to be on the line if it was deemed unnecessary they would basically, probably going to get fired, right?

For slowing down the throughput on the assembly line. That's just because those companies have that good cost culture because they have to, right? Those are low margin businesses that are notoriously difficult, right? Seems like we have to bail out another 1 of them. And those industries every 6 months and software has had the luxury of not having to think about this for a very long time.

And that's lead to led to a lot of kind of suboptimal cultures, business practices. whatever you want to [00:20:00] call it. So we think there's massive gains to be had across all software businesses. If we can just get the data and then, we think the incentives will follow and the culture will follow.

Louis-Victor Jadavji: see. Yeah. So the incentives for a software tech unicorn are very different. A couple of years ago from today and yeah, maybe we'll see them out of necessity. Like you were alluding to earlier in the call. I have to care about costs and develop a cost culture where everyone has some piece of ability in the process.

Yeah, I can see that happening.

Micah Wheat: Engineers don't want to think about this stuff. Finance people don't really maybe they think it's interesting, but. It's hard work, right? Basically identify ownership, go follow up with all the engineers that are causing these costs to be high.

It's, it's hard work, but the payoff can be huge. And, I think increasingly. Companies are going to have to think about this stuff, they don't want to. Okay. If you're can't raise another round and your company's about to go out of business. And the only solution is to get your gross margin [00:21:00] positive or whatever it is.

Get to default alive, as we say in YC they're going to have to do it,

Todd Kesselman: I was just going to say, I think it's important to highlight, it's not about the cost management, it's not about lowering the bill. It's about managing the cost side in conjunction with how you manage the revenue side.

So that you optimize where you are, right? There are times where you should be spending a lot and there's times we shouldn't. So I think that's it's important because the immediate the shield goes up right now. They have, Oh, somebody's just going to look over and say, what we're doing is costing too much, but that's not the point.

The point is the cost relative to the benefit or the corporate strategy. Where you want it to be. And then is it optimal? Because that's the other thing is that's right.

Micah Wheat: So you can optimize a high cost. The other piece of that is, maybe you under once doing this marginal cost analysis, that the ROI on a specific feature.

Or specific product is pretty much [00:22:00] constantly or uniformly negative then, the rational thing to do is just cut that business unit. If you can't make it profitable. So I think you're absolutely right. That it is about aligning revenue and costs and, finance and engineering, pricing and engineering, sales and engineering. But it is also, can be purely about cutting costs and trying to stop the bleeding.

Louis-Victor Jadavji: If I recall, Todd, we even had a customer that would incentivize their customer support people or salespeople by the profitability as it relates to cloud costs for each major account is.

We actually did see that in the field. Was that

Micah Wheat: a usage based pricing model that they went with, or do you remember how they did that?

Todd Kesselman: Actually, it wasn't, if you remember, it wasn't when they started, but then actually, as we started to give them some marginal cost information, they gradually started to shift the model because they found they had some clients who were, it's really easy on the sales side to associate high revenue with high [00:23:00] profitability.

When that may not be the case,

Micah Wheat: right? That's the case, right? Yeah.

Todd Kesselman: Yeah. So maybe your largest client is your best client and maybe your largest client is your worst client. And so I think they found out in this case, there was a mixture of the 2 because they were not charging to the point you made before you change the way you charge sometimes.

So they weren't charging for the right. They weren't, they didn't break out their products the right way so that more usage of the more expensive product would, was

Micah Wheat: charged more, right? Yeah, it's like that old marketing cliche, right? The, this CMO that says I know half of my marketing spend is wasted.

I just don't know which half, hearing similar stuff from, engineering leaders and finance people at SaaS companies where they're saying like. We know some of our customers are very low margin or deeply negative or whatever. We just don't know which ones they are.

Louis-Victor Jadavji: Yeah. And case in point, this was a company doing a lot of AI.

And and again, they have to [00:24:00] care about this stuff because it was a board level prerogative that, margins up and yeah, you're right. As we're going to see more and more companies like this with, as they adopt AI into their feature set because the margins just aren't as good, you go from 85 plus down to 70, if not under and that makes all the difference.

Micah Wheat: So

Abhishek Singh: heard, I'll be using the term AI and I think we'll have. People listening to our podcast, if we don't discuss about AI,

Louis-Victor Jadavji: So

Abhishek Singh: I think I have a question around that. So how is AI and machine learning going to affect the cloud spend of the organization? And how does it make marginal and unit cost accounting on the cloud more important?

Micah Wheat: Yeah, there's a great blog post on the Andreessen Horowitz blog that I think you actually shared with me, Elvie six months ago from Martin Casado about the new business of AI. There's also a great on Jammin Ball's sub stack. He has a great kind of analysis on this, looking at the example of Notion AI, right?

When Notion rolled out their kind of [00:25:00] AI summarizer features. All running through GPT 4, right? So it's an open question for a lot of these SaaS companies. It feels like we're still at the relatively early part of, what may be a bubble, right? Who knows?

There's certainly going to be some big businesses coming out of this. But it seems every SaaS company in America right now is like trying to force GPT 4 into their product. And it's an open question whether for example, in the case of Notion adding this. AI summarizer feature is actually going to yield positive ROI, right?

For now, it's just included in the notions, enterprise plan eventually will they have to charge more for that? Quite possibly eventually will they have to pull out some kind of usage based pricing model for that? Where it's like, some tiered, per summaries or whatever it is I think they probably will eventually.

And what

Abhishek Singh: are some of the examples of AI companies that you've seen misrepresenting the cost structure because they lack. Insights on the unit and marginal costs. And do you have any [00:26:00] examples around

Micah Wheat: that? So examples, I missed that examples of AI

Abhishek Singh: companies, misrepresenting or misinterpreting the cost structures.

Do you have any examples or insights

Micah Wheat: around that? I think that, that I would it's stuff similar to that notion example, right? Where and then you can also imagine a scenario in which notion isn't a great example here, but cost per invocation, per, running model inference isn't necessarily uniformly distributed across customers.

It just gets really tricky to unravel. These costs at the kind of per customer per usage event level with AI, right? Specifically, because AI is cost, right? The inference costs are all variable costs. Whereas before, the tilt was much more heavily toward kind of fixed costs, right?

Louis-Victor Jadavji: Yeah, fair enough. I think it is one of those things where people are shocked and surprised as to, what the truth is. They have all these wild assumptions about what their cost structure [00:27:00] actually is until you really show them. Hopefully this new wave of cloud cost management or observability tools makes a difference there.

And we're excited to see where dash dive goes. I just want to circle back on, what were the key takeaways of this conversation today? I had the following, I think the audience should really keep in mind number one is that we're in a different era, right? Software companies are feeling the pressure on cloud costs more than they did before because namely they're introducing AI features, which are lower margin features.

By nature, and also they don't have access to Zerp era funding like they did before. The 2nd, I had is that marginal cost is a very difficult concept to relate to engineers and that's. That's probably not going to change. And so you tend to want to look for organizations that maybe need to develop cost cultures as external pressures do develop or pile on.

And you did mention a few examples where, [00:28:00] manufacturing businesses tend to have lower margins. So cost culture is ingrained. And maybe we're going to see that in the world of software and technology going forward. Any other takeaways folks you think the audience should have. You need a tool because it's

Micah Wheat: go ahead.

Louis-Victor Jadavji: I was going to say you

Todd Kesselman: need a tool

Louis-Victor Jadavji: because it's complicated. Yeah. You need a tool. Yeah. Yeah. Yeah. Not going to do it or isn't going to show you this. Exactly. Yeah, exactly. Great. Thank you so much, Micah. You've been a great guest and thank you to my co host Abhishek. And thank you, Todd, for chiming in as an expert today.

This has been great. And we hope the audience here can find ways to. Not necessarily reduce cloud costs, but to improve their profitability as as sometimes you need to spend more to be more profitable and the devil's in the details. All right. Thank

Micah Wheat: you. Thanks guys.

Episode 2 | Micah Wheat on Cloud Cost Management in the Age of AI

Summary

Key Takeaways

Resources

Transcript

Sign up for the latest technology insights