Kubernetes provides powerful features for managing containerized applications, including auto-scaling to dynamically adjust the number of running instances based on workload metrics. Using the kubectl autoscale
command, you can implement auto-scaling for your deployments, ensuring optimal resource utilization and responsiveness.
Here are several examples demonstrating how to use kubectl autoscale
effectively:
Example 1: Auto-scale a deployment named frontend
to maintain CPU utilization between 50% and 80%, with a minimum of 2 replicas and a maximum of 5 replicas.
kubectl autoscale deployment frontend --cpu-percent=50 --min=2 --max=5
Output: Autoscaling is configured for deployment “frontend”.
Verification Steps: Check the current replicas of the deployment using kubectl get deployment frontend
and observe the changes in replica count based on CPU load.
Example 2: Set auto-scaling for a deployment named backend
based on custom metrics, such as requests per second (rps), targeting an average of 100 rps per pod, with a minimum of 3 replicas and a maximum of 10 replicas.
kubectl autoscale deployment backend --metric-name=requests-per-second --metric-target=100 --min=3 --max=10
Output: Autoscaling is configured based on custom metric “requests-per-second” for deployment “backend”.
Verification Steps: Monitor the metrics using tools like Prometheus or Grafana to ensure the deployment scales according to the specified rps threshold.
Example 3: Auto-scale a stateful set named database
based on memory usage, aiming to keep memory utilization at 70%, with a minimum of 1 replica and a maximum of 3 replicas.
kubectl autoscale statefulset database --memory-percent=70 --min=1 --max=3
Output: Autoscaling is configured for stateful set “database”.
Verification Steps: Use kubectl describe statefulset database
to check the current replica count and monitor memory metrics to confirm auto-scaling behavior.
Example 4: Implement auto-scaling for a deployment named api
based on GPU utilization, targeting an average of 80% GPU usage per pod, with a minimum of 2 replicas and a maximum of 6 replicas.
kubectl autoscale deployment api --resource-name=nvidia.com/gpu --metric-target=80 --min=2 --max=6
Output: Autoscaling is configured based on GPU utilization for deployment “api”.
Verification Steps: Utilize tools like NVIDIA DCGM exporter to monitor GPU metrics and ensure auto-scaling aligns with GPU utilization metrics.
Example 5: Set auto-scaling for a deployment named worker
based on external metric values, such as from a custom API endpoint, maintaining an average of 200 requests per minute per pod, with a minimum of 2 replicas and a maximum of 8 replicas.
kubectl autoscale deployment worker --external-metric-name=requests-per-minute --metric-target=200 --min=2 --max=8
Output: Autoscaling is configured based on external metric “requests-per-minute” for deployment “worker”.
Verification Steps: Monitor the external metric values directly or through an integrated monitoring system to ensure the deployment scales as expected.
Example 6: Auto-scale a deployment named load-balancer
based on HTTP request count per second, aiming for an average of 100 requests per second per pod, with a minimum of 3 replicas and a maximum of 12 replicas.
kubectl autoscale deployment load-balancer --metric-name=http_requests_per_second --metric-target=100 --min=3 --max=12
Output: Autoscaling is configured based on HTTP request count per second for deployment “load-balancer”.
Verification Steps: Use monitoring tools to analyze HTTP request metrics and verify the deployment scales accordingly.
Example 7: Set auto-scaling for a deployment named finance-app
based on custom Prometheus metrics, targeting a specific metric query, with a minimum of 4 replicas and a maximum of 10 replicas.
kubectl autoscale deployment finance-app --metric-name=prometheus --metric-target=custom_metric_query --min=4 --max=10
Output: Autoscaling is configured based on custom Prometheus metrics for deployment “finance-app”.
Verification Steps: Use Prometheus query interface to validate the metric values and observe deployment scaling based on the custom metric query results.
Example 8: Auto-scale a deployment named image-processing
based on average network traffic, aiming for 1MBps per pod, with a minimum of 2 replicas and a maximum of 5 replicas.
kubectl autoscale deployment image-processing --metric-name=network-traffic --metric-target=1MBps --min=2 --max=5
Output: Autoscaling is configured based on network traffic metrics for deployment “image-processing”.
Verification Steps: Monitor network traffic metrics using tools like Istio or Calico and confirm deployment scaling based on observed traffic patterns.
Example 9: Set auto-scaling for a deployment named analytics
based on Pod Disruption Budget (PDB) constraints, ensuring availability during maintenance or disruptions, with a minimum of 3 replicas and a maximum of 7 replicas.
kubectl autoscale deployment analytics --pdb-based --min=3 --max=7
Output: Autoscaling is configured based on Pod Disruption Budget constraints for deployment “analytics”.
Verification Steps: Use kubectl describe deployment analytics
to check PDB constraints and ensure deployment replicas adjust accordingly to maintain availability.
Example 10: Auto-scale a deployment named game-server
based on custom application metrics, targeting specific thresholds like memory usage per pod, with a minimum of 2 replicas and a maximum of 4 replicas.
kubectl autoscale deployment game-server --metric-name=custom_app_metric --metric-target=specific_threshold --min=2 --max=4
Output: Autoscaling is configured based on custom application metrics for deployment “game-server”.
Verification Steps: Use application-specific monitoring tools to verify the metric values and observe deployment scaling based on application performance metrics.
Also check similar articles.
Scaling Kubernetes Deployments with kubectl scale
Manage Resource Rollouts with kubectl rollout
Efficiently Delete Kubernetes Resources with kubectl delete
Comprehensive Guide to kubectl get Command
Understanding Kubernetes Resources with kubectl explain
Discussion about this post