AI Workloads Are Breaking Your FinOps Model
Back to Blog

AI Workloads Are Breaking Your FinOps Model

Your board approved $10M for AI transformation to gain a competitive advantage. Six months later, you already burnt through $7M. The CEO asked you, ‘What business value have we delivered so far?’ The FinOps reporting and dashboards show all green, but you cannot answer the crucial question - what business value have you delivered?

I see this trend more and more, and with GenAI spending expected to reach around $644B this year (Gartner) and grow further next year, it represents one of the largest capital allocation blind spots for organisations. The high failure rates in GenAI proof of concept projects and dissatisfaction with current results only compound the problem.

The uncomfortable truth is that your FinOps maturity just became a liability, not the competitive advantage you thought it was. The FinOps Foundation’s 2025 framework recognises that AI workloads differ fundamentally from the traditional cloud patterns we used to meticulously build our FinOps models and practices. A GenAI model consuming GPU hours behaves nothing like a traditional web app, regardless of how you deploy it - virtual guests, containers, or any other cloud-native way.

Why Your Cost Models Are Obsolete

The current FinOps model mastered complexity. Spikes for your Black Friday sales - you auto scale to meet demand. Marketing campaigns gone viral? You plan for it by building in the use of spot instances. Seasonal variance? You handle it with commitment discounts. You have playbooks built for all these scenarios and are now able to forecast quarterly spend with small variances despite the unpredictability.

AI does not honour this logic, and this is where the pattern breaks down. Your cloud and infrastructure spend used to closely mirror your business activities, high during campaigns low during quiet periods. GenAI workloads ignore this completely. They spike when your data scientists switch model architectures, stay idle for days while they are experimenting then consume everything available and then some during training runs. On average, the costs required to run GenAI workloads can be up to 15x more than standard infrastructure costs, destroying carefully planned budgets.

The procurement trap: You remember those hard fought enterprise agreements and carefully planned commitment based discounts that were based on the forecasts you developed thanks to your mature FinOps practices? Throw them out. AI workloads are too unpredictable for long term commitments. You are either losing money on under utilised reservations or stifling innovation and blowing budgets with insufficient commitment based discounts. There’s no real middle ground when the workload patterns shift on a weekly basis

The hidden multipliers: Even if you manage to navigate these uncharted waters, data pre-processing, labelling and data transformation add cost layers that were never modelled using the traditional FinOps methodologies. Failed experiments can consume millions of dollars before producing any business value. Retraining models can double your spend overnight. Add to this the complexities of compliance requirements for GenAI workloads adds new cost categories that you may not be able to track.

We face a brutal market reality. Everyone grapples with the same problem - ‘how do we leverage our existing FinOps Practices to measure AI value’. Those who recognise this gap and experiment with new approaches will pull ahead while others who try to shoehorn AI spending into traditional structures will fall behind. Gartner predicts that around 30% of GenAI projects will be abandoned due to rising costs and unclear business value. The question should be whether the capital invested is building a competitive advantage with a clear business value or funding experiments.

Don’t Manage costs, Manage Capital

If you focus on infrastructure efficiency, you’ve already lost! Here’s what happens with most organisations: The FinOps team celebrates achieving 85% GPU utilisation. The CFO in his reports sees costs being controlled. The infrastructure teams hit or exceed their efficiency targets. Everyone celebrates the wins (and gets their bonus!). Meanwhile, the business gains nothing.

The implementation gap: FinOps principles correctly emphasises business value over cost efficiency. But when faced with AI workloads, most organisations go back to what they know best, that is to measure infrastructure costs like GPU hours, utilisation rates, training costs etc. The frameworks are right, but not the metrics.

GenAI requires a higher tolerance for indirect, future financial investment criteria versus immediate ROI. (Garter Research)

Traditional cloud value metrics are just not geared to capture option values, accelerated learning or capability building speed.

Let’s take a moment to think about what organisations actually track:

  • A perfectly utilised GPU cluster running failed experiments
  • An efficiently managed training pipeline, producing the incorrect capabilities

We’ve mastered the “Inform” and “Operate” phases of FinOps for traditional workloads. However the “Operate” phase in most organisations has not evolved to cater for AI’s venture capital like dynamics.

The problem of the zombie portfolio: Walk into any organisation that has an Enterprise AI programme and you’ll find them: long running projects consuming untold hundreds of thousands of dollars and producing nothing. The dashboards all look green. The FinOps teams know something’s not quite right but lacks the vocabulary to express it. How do you even quantify “strategic learning” in a cost allocation model?

When the dashboards can’t distinguish between valuable experiments and wasteful meandering, all projects appear to be justified. The cost per inference is competitive, the technical KPIs are green, but no one seems to be able to answer: “Is this actually creating a competitive advantage?

The blind spot: BCG research shows leading companies allocate over 80% of AI investments into reshaping key functions. They are staying true to the FinOps principles, i.e. business value first, but with AI native metrics: transformation velocity, capability adoption rates, etc. The framework is sound. The implementation for AI needs to catch up. Let’s fix the metrics, not the change the principles.

The Portfolio Revolution

The successful enterprises have borrowed concepts from venture capital, applying portfolio theory to AI investments. Not all initiatives are treated equally, nor do they expect every investment to succeed. They however, do expect the portfolio to deliver returns

The three-horizon portfolio method

  • Horizon 1 - Efficiency Plays: Process automation, cost reduction initiatives, activities that have a clear ROI of 6 months or less. This will help fund the transformation
  • Horizon 2 - Revenue Expansion: Enhance customer experience, add new product capabilities, value realisation within 12 months. This drives growth.
  • Horizon 3 - Strategic Options: Emerging capabilities, future platforms and creating competitive moats to secure your future

Organisational Discipline

Failure is natural. It should be part of the strategy as long as you “fail fast”. The project kill rate should be around 30%, if not more. If it’s lower you are not taking enough intelligent risks and likely succumbing to the sunk cost fallacy and missing the opportunity of a better path. This isn’t waste, it’s the way you optimise your investment portfolio. You would do exactly this with your share portfolio, so why not with projects?

Institute real governance. Don’t rely on IT steering committees that debate the technical architecture, but with investment committees making capital allocation decisions. Think like VCs managing a fund, not IT managing projects

Next, measure the health of the portfolio not just project success. A portfolio with a 100% success rate is just as concerning as a portfolio with 100% failures. Look at the full spectrum from safe bets funding experiments i.e. your cash cows to experiments discovering breakthroughs.

It’s a stark competitive reality, the traditional project by project governance simply cannot keep up with the speed of a portfolio based governance model. You don’t want to be in a position to where you are working on quarterly reviews while the leadership team expect to make monthly adjustments.

The 90 day plan

So what’s in the 90 day plan? Nothing that needs massive investments, just executive will and courage to implement. This needs to be a top-down decision and not bottom up.

Move 1 - Elevate the conversation:

You need to shift AI governance to at least the CFO level. Treat it as the capital allocation strategy it needs to be and not a technology initiative. Create the AI Investment Committee that meets on a regular schedule. Let the technical steering group debate the architecture while the Investment Committee debates the capital allocation

Move 2 - Implement the portfolio based metrics.

Stop tracking just infrastructure, its needed but not the full picture. Start measuring the portfolio with specific actionable metrics. If I were to tie it back to the 3 horizon portfolio method it would look something like this:

Horizon 1 (Efficiency):

  • Value Delivery: Track actual reduction in process time, headcount hours saved, operational costs eliminated etc. Not “potential” savings but actual, realised benefits that make an impact to your P&L
  • Speed Gates: 12 weeks to pilot, 6 months to production. Projects exceeding these timelines get one review to justify continued investment or face termination
  • Success thresholds: Must return “x” times the investment within a specified period of time, for example 2x returns in 12 months. Terminate projects not meeting this rate of return (RoR) and reallocate the capital elsewhere.

Horizon 2 (Revenue Allocation):

  • Capability Adoption: Track usage patterns not deployments, for example monthly active users divided by target user base. If the helpdesk team have an AI Assistant measure how many queries they run daily, not just how many have access. This ties back to Value Delivery to see if there are real savings being achived
  • Option Value: Track new capabilities unlocked not instant returns. Are you now able to process unstructured data that you previously could not? Enter new markets? Is adding new features to your product possible without this capability? The questions you ask will depend on your organisation.
  • Strategic Gates: Every 6 months ask yourself “Does this still represent a current or future competitive advantage?” If the answer is no, shut it down and reallocate the capital.

Horizon 3 (Strategic):

  • Learning Velocity: the number of hypotheses tested each quarter, not just project completion. Document the learnings not just what you built.
  • Capital Velocity: How much funding moves between horizons each quarter? Static allocation means you are not learning or not being aggressive enough

Move 3 - Institute Ruthless Discipline

Regular portfolio reviews (at least quarterly, monthly if feasible) with actual decisions, not just debate. Quarterly to 6 monthly rebalancing where funding moves between horizons. Document every decision to terminate or stop projects to improve selection criteria for the future. Use the returns from Horizon 1 to fund the experiments in Horizon 3

These frameworks exist. My capital allocation article provides the foundations. The winners will be determined by the speed at which they implement. Don’t spend 6 months debating this, the leaders will implement this in 90 days leaving you behind.

The Strategic Imperative

The AI investment explosion shows no signs of slowing down. It is, instead, accelerating. Gartner forecast that the worldwide GenAI spending would hit $644 billion by the end of 2025, a 76% increase from 2024. Most organisations will look to invest in AI in one form or another. The question remains, will that investment help create a competitive advantage or end up as an expensive hobby.

FinOps maturity without AI adaptation will end up as technical debt. Don’t let the comprehensive cost management framework you’ve spent years building prevent AI value realisation. The choice is simple: Transform your FinOps practice to cater for AI workloads or fail using traditional FinOps methods.

Ready to Transform Your AI Investment Approach?

Find out how to implement portfolio based AI governance in your organisation.

Contact Us