Autoscaling

NKS automatically scales worker nodes up when pods are waiting for capacity and consolidates them when usage drops. You declare which node pools the cluster can use; NKS handles scaling, workload packing, and graceful node termination on top of them.

How Autoscaling Works on NKS

If you’ve used managed node groups in EKS, GKE, or AKS, the NKS model will feel familiar.

You declare one or more node pools on the cluster. Each pool has an instance type and a node count. The cluster’s autoscaler watches for pending pods, picks the best-fitting pool to scale, and adds nodes to it. When nodes become underutilized, the autoscaler cordons and drains them before requesting termination — workloads get a clean shutdown, never an abrupt one.

The model is intentionally simple to operate:

Capacity unit: the node pool. You decide which instance types your cluster has access to by declaring pools with those types.
Scaling engine: NKS runs it at the platform level. There’s no autoscaler controller to install, no custom resources to author, and no CRDs in your cluster to manage. Autoscaling is powered by the open-source Karpenter project under the hood, but you interact with the standard Nirvana cluster and node pool resources — the same ones you’d use without autoscaling.
Scope: autoscaling operates inside the bounds you set. It scales pools you declared up to their limits; it doesn’t create new instance types or new pools on its own.

What Autoscaling Does for You

Workload-driven node selection. As pods queue up, the autoscaler matches their resource requests against your declared pool shapes and picks the smallest pool that can hold them. A cluster with a general pool (4 vCPU) and a compute pool (16 vCPU) will route a small web workload to general and a compute-heavy job to compute — automatically, per pod.

Active consolidation. Rather than waiting for individual nodes to sit idle for long enough to trigger scale-down, the autoscaler continuously evaluates the cluster as a whole. If pods on three half-empty nodes could fit on two, it repacks them. This typically keeps clusters running with significantly fewer nodes than a passive autoscaler over the same workload.

Graceful termination. Before any node is terminated, the autoscaler cordons it (no new pods scheduled), evicts existing pods respecting their PodDisruptionBudgets, and waits for clean shutdown. You won’t see pods killed mid-flight by routine scale-down.

Enabling Autoscaling

Autoscaling is a cluster-level toggle. Enable it at cluster creation via API or Terraform:

API

{
  "name": "production-cluster",
  "region": "us-sva-2",
  "autoscaling": true
}

Terraform

resource "nirvana_nks_cluster" "my_cluster" {
  name        = "production-cluster"
  region      = "us-sva-2"
  autoscaling = true
}

When using the terraform-nirvana-nks module, pass autoscaling = true as a module input.

Before autoscaling can do anything useful, the cluster needs at least one node pool. Declare pools at cluster creation or afterward; autoscaling picks them up automatically.

A typical setup with a small “general” pool and a larger “compute” pool:

resource "nirvana_nks_node_pool" "general" {
  cluster_id = nirvana_nks_cluster.my_cluster.id
  name       = "general"
  node_count = 1
  node_config = {
    instance_type = "n1-standard-4"
    boot_volume   = { size = 100, type = "abs" }
  }
}

resource "nirvana_nks_node_pool" "compute" {
  cluster_id = nirvana_nks_cluster.my_cluster.id
  name       = "compute"
  node_count = 1
  node_config = {
    instance_type = "n1-highcpu-16"
    boot_volume   = { size = 200, type = "abs" }
  }
}

With autoscaling on, you don’t need to maintain node_count manually as workloads grow — the autoscaler increments it for you. Setting node_count = 1 at creation establishes the minimum capacity (the pool always has at least one node) and gives the autoscaler a starting point.

Operational Notes

Use the autoscaler for scale-down. Manual pool resize via the API, Terraform, or dashboard terminates VMs immediately and does not drain pods gracefully. When autoscaling is enabled, prefer letting the autoscaler reduce node counts — it handles cordon, drain, and PDB-respecting eviction. If you need to manually shrink a pool, drain affected nodes with kubectl drain first.

Pool catalog defines the instance-type universe. The autoscaler can only scale instance types backed by a declared node pool. If you want the cluster to be able to summon a new instance type, declare a new pool for it. A pool with node_count = 1 acts as a low-cost “make this instance type available” entry — the autoscaler can grow it as demand appears.

Resource requests matter. The autoscaler decides node-pool selection from pod resource requests, not actual usage. A pod that requests 1 CPU / 1Gi but actually uses 6 CPU will pack onto small nodes and degrade. Right-size your requests for the autoscaler to make good packing decisions.