Stop Losing Money on Spot Instance Interruptions in EKS
Learn how to handle AWS Spot Instance interruptions in EKS without losing reliability using Karpenter and Node Termination Handlers.
The Hidden Cost of Spot Interruptions
Spot Instances can save you up to 90% compared to On-Demand pricing. But there's a catch: AWS can reclaim them with just 2 minutes notice. Without proper handling, this leads to:
- ×Failed deployments mid-rollout
- ×Dropped connections and user-facing errors
- ×Lost in-flight jobs and batch processing failures
- ×Cascading failures when multiple nodes get reclaimed
Real Impact: One client was losing ~$4,200/month in Spot savings because fear of interruptions kept them on On-Demand. The actual solution took 2 hours to implement.
The Graceful Interruption Architecture
The key is building a system that treats interruptions as expected events, not emergencies. Here's the stack:
1. Karpenter for Intelligent Provisioning
Karpenter replaces Cluster Autoscaler with smarter, faster node provisioning. Key configuration:
# Karpenter NodePool with Spot diversity
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: spot-diverse
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
values:
- m5.large
- m5.xlarge
- m5a.large
- m5a.xlarge
- m6i.large
- m6i.xlarge # Diversify!
nodeClassRef:
name: default
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30sWhy this works: Instance type diversity means AWS is less likely to reclaim ALL your nodes at once.
2. AWS Node Termination Handler
This DaemonSet catches the 2-minute warning and gracefully drains pods:
# Install via Helm helm repo add eks https://aws.github.io/eks-charts helm install aws-node-termination-handler \ eks/aws-node-termination-handler \ --namespace kube-system \ --set enableSpotInterruptionDraining=true \ --set enableRebalanceMonitoring=true \ --set enableScheduledEventDraining=true
Critical: The handler cordons the node and evicts pods with respect for PodDisruptionBudgets.
3. PodDisruptionBudgets (PDBs)
PDBs ensure you always have minimum availability during disruptions:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2 # Or use maxUnavailable: 1
selector:
matchLabels:
app: api-serverRule of thumb: Set minAvailable to N-1 where N is your replica count.
The Results
60-70%
Cost Reduction
vs On-Demand
99.9%
Uptime
maintained
<30s
Pod Migration
during interruption
Quick Implementation Checklist
- Deploy AWS Node Termination Handler as DaemonSet
- Configure Karpenter with 5+ instance type diversity
- Set PodDisruptionBudgets for all critical workloads
- Ensure all pods have proper terminationGracePeriodSeconds
- Test with Spot interruption simulation (AWS FIS)
- Monitor with Karpenter metrics + CloudWatch
Want Us to Implement This For You?
Our EKS Kill-Box engagement includes Spot Instance optimization as standard. Get 18-35% savings with zero reliability risk.
Book Your Free Cost Scan