Kubernetes Cost Optimization &

// Table of Contents

The Kubernetes Cost Problem
Kubernetes Cost Visibility and Attribution
Resource Requests and Limits Optimisation
VPA and HPA: Automated Rightsizing
Node Pool and Cluster Rightsizing
Spot and Preemptible Nodes
Cluster Consolidation
Commitment Discounts for K8s Workloads
Next Steps

Kubernetes has become the dominant infrastructure abstraction for enterprise cloud workloads — and it is one of the most expensive when not actively optimised. The combination of conservative resource requests set by engineers, idle cluster capacity maintained for burst tolerance, and overprovisioned node pools means that typical Kubernetes environments run at 15–30% average CPU utilisation. The gap between provisioned and utilised capacity represents substantial waste. This article is part of our enterprise cloud cost optimisation framework.

Kubernetes cost optimisation has two distinct layers: the workload layer (pod resource requests, limits, and autoscaling configuration) and the infrastructure layer (node pool sizing, instance type selection, and cluster consolidation). Most Kubernetes cost discussions focus on one layer or the other; this guide addresses both, together with the commitment discount strategy that sits above them.

// The K8s Overprovisioning Pattern

The most common pattern we see in enterprise Kubernetes environments: engineers set resource requests 2–4x higher than actual usage (to avoid OOM kills and throttling), cluster autoscaler adds nodes to satisfy those requests, and the result is a cluster running at 20–25% utilisation. No one is "wrong" — the individual decisions are rational — but the aggregate effect is an estate that costs 2–4x more than necessary for the actual workload. Systematic rightsizing addresses this at both layers simultaneously.

The Kubernetes Cost Problem

Kubernetes cost optimisation is distinct from general cloud compute optimisation because the cost drivers operate at multiple layers simultaneously. The VM layer (EC2 instances in EKS, VMs in AKS, Compute Engine nodes in GKE) is where money is actually spent — but the sizing of those VMs is determined by the aggregate resource requests of the pods scheduled on them, which are set by application developers who have no financial visibility into their impact on infrastructure costs.

This disconnect between the people who set resource requests (developers) and the people who pay the resulting bills (FinOps/infrastructure teams) is the root cause of Kubernetes overprovisioning. Without feedback mechanisms that connect resource request decisions to cost outcomes, engineers will consistently err on the side of over-requesting — a rational defensive choice given the consequences of under-requesting (OOM kills, CPU throttling, degraded application performance) versus the consequences of over-requesting (nothing visible to the engineer).

Typical K8s Cost Distribution

Cost Component	Typical % of K8s Spend	Optimisation Lever
Worker node compute (on-demand)	50–65%	Rightsizing + Spot + Commitments
Worker node compute (committed)	15–25%	Portfolio optimisation
Managed control plane (EKS/AKS/GKE)	5–10%	Cluster consolidation
Persistent storage (EBS/Disk/PD)	10–20%	Storage class + lifecycle
Load balancers + networking	5–10%	Architecture review

Kubernetes Cost Visibility and Attribution

Before optimising Kubernetes costs, the enterprise needs visibility at the workload level — not just the cluster level. Knowing that a Kubernetes cluster costs $150K per month is not actionable; knowing that a specific application namespace within that cluster accounts for $40K and runs at 18% CPU utilisation is. Granular attribution is the prerequisite for targeted optimisation.

Free Guide

Cloud Contract & FinOps Guide

Master cloud spend negotiation: EDP/MACC structures, reserved instance strategy, and committed-use discounts.

Download Free Guide → Cloud FinOps Advisory

Namespace Cost Allocation

Kubernetes namespaces are the natural unit of cost attribution for most enterprise K8s environments. Each team or application typically owns one or more namespaces, and pod-level resource consumption metrics (CPU requests, memory requests, actual CPU usage, actual memory usage) are available per namespace via the Kubernetes Metrics API or tools like kube-state-metrics and Prometheus.

The challenge is translating pod-level resource consumption into dollar costs, which requires proportional allocation of node costs to the pods scheduled on each node. Tools such as OpenCost (open source), Kubecost, Cast.ai, and StormForge provide this allocation, mapping pod resource requests and usage to node costs and presenting namespace-level or label-level cost breakdowns. Implementing one of these tools is the foundational step for any serious Kubernetes cost optimisation programme.

Labels as the Attribution Layer

Kubernetes labels provide the attribution taxonomy below the namespace level. A consistent labelling convention — applying labels such as team, product, environment, cost-centre, and application to all pods and deployments — enables cost attribution at the label dimension. This requires enforcement: a policy that prevents workloads without required labels from being scheduled ensures attribution quality over time, as otherwise individual teams will deploy unlabelled workloads that become "dark spend" in FinOps reporting.

Resource Requests and Limits Optimisation

The most impactful Kubernetes cost optimisation action is typically the most straightforward: reducing overinflated resource requests. Resource requests determine how much CPU and memory the Kubernetes scheduler reserves on a node for a pod — if pods request more than they use, the difference is wasted reserved capacity that cannot be used by other pods, forcing the cluster autoscaler to provision additional nodes unnecessarily.

Stay Ahead of Vendors

Get Negotiation Intel in Your Inbox

Monthly briefings on vendor pricing changes, audit trends, and contract tactics. Unsubscribe any time.

No spam. No vendor affiliations. Buyer-side only.

The Requests vs Usage Gap

Across enterprise Kubernetes environments we have reviewed, the median CPU request-to-usage ratio is approximately 3:1 — meaning pods request three times the CPU they actually consume on average. For memory, the ratio is typically lower (1.5–2:1) because memory limits are more likely to cause OOM kills, so engineers are slightly more conservative. A cluster running at 3:1 CPU request-to-usage effectively charges the enterprise for three times the compute it actually consumes.

Identifying the worst offenders is straightforward: pull 30-day average CPU and memory usage per pod from Prometheus or the cloud-native monitoring service (AWS Container Insights, Azure Monitor for Containers, GCP Cloud Monitoring) and calculate the request-to-usage ratio per pod. Pods with ratios above 3:1 CPU or 2:1 memory are priority rightsizing targets.

Kubernetes Cost Review — Free Assessment

Our advisors analyse Kubernetes cost environments and identify the rightsizing, commitment, and governance opportunities that reduce K8s spend by 30–50%.

Book Free K8s Review → Download Cloud FinOps Playbook

VPA and HPA: Automated Rightsizing

Manual rightsizing of resource requests is a one-time exercise that becomes stale as workloads evolve. Automated rightsizing tools — specifically the Vertical Pod Autoscaler (VPA) and Horizontal Pod Autoscaler (HPA) — provide continuous adjustment of resource allocations based on observed usage patterns.

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts the CPU and memory requests of pods based on observed resource utilisation history. In recommendation mode (the safest starting point), VPA analyses usage patterns and provides right-sizing recommendations without automatically applying them — these recommendations can be reviewed and applied manually. In auto mode, VPA updates resource requests and restarts pods when significant right-sizing opportunities are identified.

VPA in recommendation mode is an excellent tool for generating a priority-ranked list of rightsizing targets across the cluster, surfacing which pods have the largest gap between requests and actual usage. Most enterprises start here before enabling auto mode, as pod restarts during VPA auto adjustments require application compatibility assessment.

Horizontal Pod Autoscaler (HPA)

HPA automatically scales the number of pod replicas based on observed CPU or custom metrics. For workloads with variable traffic patterns (web services, APIs, batch processors), HPA ensures that the pod count scales with actual demand rather than remaining static at a level provisioned for peak capacity. The interaction between HPA and cluster autoscaler is important to configure correctly: HPA reduces pod replicas during low demand, which frees nodes for removal by the cluster autoscaler — a mechanism that requires both to be properly configured.

Node Pool and Cluster Rightsizing

Even with optimal pod-level resource requests, the infrastructure layer requires independent rightsizing. Node pool configuration — instance type, size, minimum and maximum node counts, and instance family — has a direct impact on both cost and utilisation efficiency.

Instance Type Selection for Node Pools

The choice of EC2/VM instance type for Kubernetes node pools has significant cost implications. Many enterprises inherit node pool configurations from initial cluster setup that are not optimal for the evolved workload mix. Key considerations: CPU-to-memory ratio (workloads that are memory-heavy benefit from memory-optimised instances; compute-heavy workloads from compute-optimised), instance generation (newer instance generations typically offer better price-performance), and ARM vs x86 (Graviton/Ampere instances on AWS and Azure provide 20–40% better price-performance for compatible workloads).

Cluster Autoscaler Configuration

The cluster autoscaler should be configured to scale down aggressively during low-utilisation periods. Common misconfigurations that prevent scale-down: scale-down utilisation threshold set too high (default 50% often needs to be raised to 60–70% to see scale-down in practice), scale-down delay too long (default 10-minute delay often extended), and pod disruption budgets preventing node draining. Reviewing and tuning cluster autoscaler configuration for scale-down behaviour is a high-ROI, low-risk action in most enterprise K8s environments.

Spot and Preemptible Nodes

Spot instances (AWS), Spot VMs (Azure), and Preemptible/Spot VMs (GCP) offer 60–90% cost reductions versus on-demand pricing for interruptible compute. For Kubernetes workloads, Spot nodes are available through node pools configured with Spot instance types and appropriate fault-tolerance configuration.

The key to effective Spot usage in Kubernetes is workload classification: not all pods are Spot-appropriate, and mixing Spot and on-demand nodes requires correct pod anti-affinity and toleration configuration to route the right workloads to the right node types. Stateless, horizontally scalable workloads (microservices, APIs with multiple replicas, batch workers) are typically good Spot candidates. Stateful workloads, single-replica deployments, or workloads with strict SLAs require on-demand nodes.

Configuring a mixed-instance node group with on-demand as baseline and Spot as burst capacity — a pattern supported by EKS managed node groups, AKS Spot node pools, and GKE node pools — allows the enterprise to achieve 40–60% Spot cost reductions on the burst layer while maintaining on-demand reliability for the baseline. The broader cloud waste and cost elimination context is covered in our article on cloud waste reduction strategies.

Cluster Consolidation

Cluster proliferation — the creation of many small, siloed Kubernetes clusters for individual teams or applications — is a common pattern in enterprise organisations where development teams have the autonomy to spin up their own clusters. Small clusters are expensive on a per-workload basis because the managed control plane cost (EKS: $0.10/hr per cluster; AKS: free but with management overhead; GKE: $0.10/hr or free for Autopilot) and minimum node pool requirements apply regardless of workload size.

A cluster with one node running three pods costs approximately the same as one with one node running 30 pods — the per-pod cost difference is dramatic. Consolidating many small clusters into fewer, larger clusters with strong namespace-based multi-tenancy reduces control plane overhead, improves utilisation efficiency, and simplifies commitment discount portfolio management. The governance challenge — persuading individual teams to share clusters — is organisational rather than technical, but the cost savings justify the investment.

Commitment Discounts for K8s Workloads

The node instances underlying Kubernetes clusters are subject to the same commitment discount mechanics as any other cloud compute — Reserved Instances, Savings Plans, and CUDs apply to the underlying VMs whether or not they run Kubernetes. Kubernetes workloads are actually well-suited for commitment discount coverage because the aggregate node-level compute tends to be more stable than individual VM utilisation — even as individual pods scale up and down, the cluster as a whole often maintains a consistent infrastructure baseline.

The optimal commitment strategy for enterprise Kubernetes is to cover the stable node baseline with Compute Savings Plans (AWS) or spend-based CUDs (GCP), rather than instance-specific Reserved Instances — the dynamic node pool composition in a well-tuned K8s cluster makes resource-specific commitments less appropriate. For the full commitment portfolio framework, see our guide on Reserved Instances vs Savings Plans.

Next Steps

Kubernetes cost optimisation is a multi-layer programme that requires both technical intervention (resource requests, VPA, HPA, node pool configuration) and governance improvements (cost attribution, team accountability, commitment portfolio management). The highest-ROI starting point varies by environment, but for most enterprises with significant K8s spend, the combination of workload rightsizing via VPA recommendations and Spot node adoption for batch workloads delivers the largest initial savings with manageable operational risk.

For the broader cloud cost optimisation context, return to our enterprise cloud cost optimisation pillar guide. For the overarching cloud waste picture that encompasses Kubernetes and other infrastructure categories, see our article on cloud waste reduction strategies. IT Negotiations provides independent advisory on cloud infrastructure cost optimisation — contact us for a no-obligation Kubernetes cost assessment.

Kubernetes Cost Optimization: The Enterprise Rightsizing Guide