Skip to main content
A guide to mastering autoscaling in Kubernetes with KEDA
14 Mar 2024

A guide to mastering autoscaling in Kubernetes with KEDA

Efficiently managing containerized applications in Kubernetes involves a critical component: autoscaling. This dynamic allocation of resources based on demand is essential for optimizing performance and cost-effectiveness. However, implementing autoscaling can be challenging, especially when integrating external metrics and working with Horizontal Pod Autoscaler (HPA).

This is where Kubernetes Event-driven Autoscaling (KEDA) steps in. With KEDA, autoscaling becomes more flexible and powerful, allowing you to scale your applications based on a wide range of metrics, including Prometheus metrics. This blog will walk you through the process of autoscaling with KEDA and show you how to efficiently scale your applications in a Kubernetes environment.

Why should you autoscale in Kubernetes

The number of containers running an application in a Kubernetes cluster can vary based on demand. Traditionally, this scaling process required manual intervention or tedious scripting, leading to inefficient resource allocation and potential downtime during peak traffic. However, the ability to adjust resource allocation based on autoscaling has been a game-changer. Let’s find out how.

Benefits of autoscaling in Kubernetes

With the ability to automatically adjust resource allocation based on demand, autoscaling in Kubernetes provides the following benefits: 

  • Improved performance: Autoscaling ensures applications can handle high traffic loads without compromising performance. The applications remain responsive even during peak times by automatically adding resources when needed. 
  • Cost optimization: Autoscaling allows for efficient utilization of resources, scaling them up or down as needed. It eliminates the need for overprovisioning, saving costs on unnecessary resources. KEDA can scale down the number of pods to zero if there are no events to process. This is harder to do using the standard HPA, and it helps ensure effective resource utilization and cost optimization, ultimately bringing down the cloud bills.
  • Simplified management: Autoscaling eliminates the need for manual intervention, reducing the risk of human error and improving overall operational efficiency.

Introducing KEDA: Autoscaling made easy

Steps for Autoscaling in KEDA

KEDA (Kubernetes Event-driven Autoscaling) is an open-source project that simplifies and enhances Kubernetes' autoscaling capabilities. It enables autoscaling based on events instead of the traditional resource utilization metrics. KEDA allows users to your applications dynamically in response to various events, such as message queue length, HTTP requests, or custom metrics emitted by your application. With KEDA, users can use various event sources, including Azure Queue, Kafka, RabbitMQ, Prometheus metrics and more. This flexibility enables the creation of scalable and responsive applications that adapt to different workload patterns. 

Let’s dive deeper into the functionality of KEDA to get a comprehensive understanding and hands-on of how it works. Harnessing the power of KEDA can ensure optimal performance and resource utilization in your containerized applications.

Installing KEDA and configure autoscaling

Let’s use the Helm chart installation method to deploy KEDA to the minikube cluster. Users have the flexibility to use any vendor-specific Kubernetes cluster, provided, they have appropriate cluster access and a locally installed Kubectl client.

helm repo add https://kedacore.github.io/charts 
helm repo update 
helm install keda kedacore/keda --namespace keda --create-namespace 

KEDA CRDS (custom resource definition) are available in your cluster once you install the Helm chart for KEDA. You must define the configuration using these CRDs.

  • Scaledobjects.keda.sh
  • Scaledjobs.keda.sh
  • Triggerauthentications.keda.sh
  • Clustertriggerauthentications.keda.sh

These custom resources allow you to link an event source (along with its authentication) to a Deployment, Custom Resource, StatefulSet, or Job for scaling purposes.

  • ScaledObjects represent the desired mapping between an event source (e.g., Rabbit MQ) and the Kubernetes Deployment, StatefulSet, or any custom resource that defines /scales a subresource.
  • ScaledJobs represent the mapping between the event source and Kubernetes Job.
  • ScaledObject/ScaledJob may also make reference to a TriggerAuthentication or ClusterTriggerAuthentication, containing the necessary authentication configurations or secrets for monitoring the event source.

Actual functionality of KEDA

Now let's set up nginx deployment using the below manifests

nginx-deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        resources:
          Requests:
            Cpu: “0.5”
            memory: “200Mi”

---

apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  selector:
    app: nginx
  type: clusterIP
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
kubectl apply –f nginx-deploy.yaml

Now, let’s set up the scaledobject based on trigger events such as average CPU and memory utilization, Prometheus metrics, etc. This scaledobject will auto-scale the replicas for targeted Nginx deployment.

Note: Please ensure the metric server should be installed in the cluster. You can install it on the cluster if it doesn't exist.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/high-availability-1.21+.yaml

Alternatively you can also use the below command in minikube to enable it.

minikube addons enable metrics-server

Scaledobject.yaml manifest

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: cpu-scaledobject
  namespace: default
spec:
  maxReplicaCount: 5
  minReplicaCount: 1
  scaleTargetRef: #scaletargerref is deployment name in the same namepsace
    name: nginx
  triggers:
  - type: cpu
  metricType: Utilization # Allowed types are 'Utilization' or 'AverageValue'
  metadata:
  value: "50"

Validating the scaling by generating load

The below command will generate the load and if it crosses the threshold defined in the scaled object, you can see the autoscaling taking place.

kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://nginx; done"

Prometheus scaler use case

Next, we will setup Prometheus over the same Minikube cluster using Helm charts.

Helm repo add prometheus-community https://prometheus-community.github.io/Helm-charts 
Helm repo update 
Helm install prometheus prometheus-community/kube-prometheus-stack --namespace=prometheus --create-namespace --wait

Alternatively you can setup prometheus over single VM and use the Prometheus endpoint in scaledobject yaml.

nginx-deploy-scaledobject.yaml

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: nginx-deployment-memory
  namespace: default
spec:
  scaleTargetRef:
    name: nginx
  pollingInterval: 30 # Must be seconds
  minReplicaCount: 2
  maxReplicaCount: 3
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-server.prometheus.svc.cluster.local:9090
      metricName: pod_memory_percentage_usage_nginx
      threshold: '30'
      query: sum(round(max by (pod) (max_over_time(container_memory_usage_bytes{namespace="default",pod=~"nginx.*"}[1m])) / on (pod)(max by (pod) (kube_pod_container_resource_limits{namespace="default",pod=~"nginx.*",resource="memory"})) * 100,0.01))
kubectl apply -f nginx-deploy-scaledobject.yaml

You can conduct load testing using the below kubectl command

kubectl run -i --tty load-generator-test --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://nginx; done"

Using Cron scaler for scaling

The following specification shows a scaledobject with a cron trigger that scales based on a Cron Schedule.

Cron scaler use case

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: nginx-cron-scaledobject
  namespace: default
spec:
  scaleTargetRef:
    kind: Deployment
    name: nginx-deployment
  pollingInterval: 30
  triggers:  
  - type: cron
    metadata:
      desiredReplicas: "2"
      end: 0 6 * * *
      start: 0 0 * * *
      timezone: Asia/Calcutta

KEDA enables the integration of CPU/memory triggers with other triggers. You just need to change the trigger section and specify the configuration for CPU, memory and cron, or any supported scaler. The example below demonstrates a ScaledObject that combines a cron trigger with both CPU and memory triggers.

triggers:
  - type: cron
    metadata:
      desiredReplicas: "30"
      end: 0 6 * * *
      start: 0 0 * * *
      timezone: Europe/London
  - type: cpu
    metadata:
      type: Utilization
      value: "90"
  - type: memory
    metadata:
      type: Utilization

Monitoring KEDA

Monitoring KEDA is possible through Kubernetes events, Horizontal Pod Autoscaler (HPA) metrics, and Prometheus metrics. This comprehensive monitoring approach allows for the observation and adjustment of autoscaling activities to optimize the balance between cost, time, and performance.

Scale your Kubernetes with KEDA today

KEDA represents a significant advancement in the world of Kubernetes autoscaling, particularly for event-driven workloads. By seamlessly integrating with Kubernetes deployments and leveraging its rich ecosystem, KEDA enables efficient and dynamic scaling based on various event sources such as queues, streams, or custom metrics. Throughout this blog, we've explored the key features and benefits of KEDA, including its support for multiple event sources, its lightweight and extensible architecture, and its ability to scale to zero, which optimizes resource utilization and reduces costs. Try it out yourself!
 

Subscribe to our feed

select webform