Back to Blog
Reliability
Dec 12, 2025
Michael Shobitan

Stop Losing Money on Spot Instance Interruptions in EKS

Learn how to handle AWS Spot Instance interruptions in EKS without losing reliability using Karpenter and Node Termination Handlers.

The Hidden Cost of Spot Interruptions

Spot Instances can save you up to 90% compared to On-Demand pricing. But there's a catch: AWS can reclaim them with just 2 minutes notice. Without proper handling, this leads to:

  • ×Failed deployments mid-rollout
  • ×Dropped connections and user-facing errors
  • ×Lost in-flight jobs and batch processing failures
  • ×Cascading failures when multiple nodes get reclaimed

Real Impact: One client was losing ~$4,200/month in Spot savings because fear of interruptions kept them on On-Demand. The actual solution took 2 hours to implement.

The Graceful Interruption Architecture

The key is building a system that treats interruptions as expected events, not emergencies. Here's the stack:

1. Karpenter for Intelligent Provisioning

Karpenter replaces Cluster Autoscaler with smarter, faster node provisioning. Key configuration:

# Karpenter NodePool with Spot diversity
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot-diverse
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: 
            - m5.large
            - m5.xlarge
            - m5a.large
            - m5a.xlarge
            - m6i.large
            - m6i.xlarge  # Diversify!
      nodeClassRef:
        name: default
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s

Why this works: Instance type diversity means AWS is less likely to reclaim ALL your nodes at once.

2. AWS Node Termination Handler

This DaemonSet catches the 2-minute warning and gracefully drains pods:

# Install via Helm
helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler \
  eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining=true \
  --set enableRebalanceMonitoring=true \
  --set enableScheduledEventDraining=true

Critical: The handler cordons the node and evicts pods with respect for PodDisruptionBudgets.

3. PodDisruptionBudgets (PDBs)

PDBs ensure you always have minimum availability during disruptions:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2  # Or use maxUnavailable: 1
  selector:
    matchLabels:
      app: api-server

Rule of thumb: Set minAvailable to N-1 where N is your replica count.

The Results

60-70%

Cost Reduction

vs On-Demand

99.9%

Uptime

maintained

<30s

Pod Migration

during interruption

Quick Implementation Checklist

  • Deploy AWS Node Termination Handler as DaemonSet
  • Configure Karpenter with 5+ instance type diversity
  • Set PodDisruptionBudgets for all critical workloads
  • Ensure all pods have proper terminationGracePeriodSeconds
  • Test with Spot interruption simulation (AWS FIS)
  • Monitor with Karpenter metrics + CloudWatch

Want Us to Implement This For You?

Our EKS Kill-Box engagement includes Spot Instance optimization as standard. Get 18-35% savings with zero reliability risk.

Book Your Free Cost Scan