Kubernetes Security Hardening: A Production Checklist for 2025

Why your cluster is probably misconfigured

The majority of Kubernetes security incidents trace back to three root causes: overly permissive RBAC, workloads running as root, and secrets stored as plain environment variables. The Kubernetes default configuration is optimised for flexibility, not security. Hardening requires deliberate choices at every layer.

Identity & Access

RBAC minimum privilege — Every service account should have exactly the permissions it needs, verified quarterly. Use kubectl auth can-i --list --as=system:serviceaccount:ns:sa to audit what each SA can do.

No cluster-admin for workloads — If an application requests cluster-admin, that is a red flag requiring architectural review, not a rubber-stamp approval.

Disable automountServiceAccountToken — Set automountServiceAccountToken: false on all pods that do not need Kubernetes API access.

Workload Security

Pod Security Standards — Restricted — Enforce the restricted profile on production namespaces using the built-in Pod Security Admission controller. This blocks privileged containers, host path mounts, and root execution by default.

Non-root containers — Set runAsNonRoot: true and runAsUser: 1000 in your security context. Most application containers do not need root. Those that claim to do should be investigated.

Read-only root filesystem — readOnlyRootFilesystem: true prevents attackers from writing malware to disk after a container escape. Mount writable volumes only where needed (e.g., /tmp).

Drop ALL capabilities — capabilities: drop: [ALL] and add back only what you need (NET_BIND_SERVICE for port 80/443, nothing else for most workloads).

Network Policies

By default, all pods can talk to all pods. Network policies are your firewall inside the cluster. Every namespace should have a default-deny ingress and egress policy, with explicit allow rules for each required communication path:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes: [Ingress, Egress]

Then add specific allow policies. A service that only needs to receive traffic from the ingress controller and talk to the database should have exactly two policies — nothing more.

Secrets Management

External Secrets Operator — Pull secrets from GCP Secret Manager, AWS Secrets Manager, or HashiCorp Vault at pod startup. Kubernetes Secrets are base64-encoded, not encrypted — anyone with etcd access can read them.

Encrypt etcd at rest — Enable encryption provider configuration for Secrets in your control plane. On managed clusters (GKE, EKS, AKS), verify that CMEK is enabled.

Image Security

Scan every image in CI — Trivy or Snyk Container should block builds with Critical CVEs. Do not scan in production — fix upstream.

Admission controller — OPA Gatekeeper — Enforce that only images from your approved registry (europe-west1-docker.pkg.dev/my-project/*) can run in production namespaces. Block :latest tags.

Image signing — Use Cosign with Sigstore to sign images at build time and verify signatures at admission. Unsigned images should not run in production.

Runtime Security

Falco is the open-source standard for Kubernetes runtime threat detection. It monitors syscalls and generates alerts when a process deviates from expected behaviour — for example, a web server spawning a shell or reading /etc/shadow:

- rule: Terminal shell in container
  desc: A shell was spawned inside a container
  condition: >
    container and
    shell_procs and
    proc.name in (shell_binaries) and
    k8s.pod.name != ""
  output: >
    Shell opened in container (user=%user.name
    container=%container.name pod=%k8s.pod.name)
  priority: WARNING

Connect Falco alerts to your SIEM or PagerDuty. A shell spawning in a production container at 03:00 UTC is worth waking someone up for.

Audit Logging

Enable API server audit logging with a policy that captures all requests at the RequestResponse level for sensitive resources (Secrets, RBAC, ServiceAccounts) and Metadata for everything else. Ship logs to your SIEM within 5 minutes. Retention: 90 days minimum for compliance.

The bottom line

Security hardening is not a one-time task — it is an ongoing posture. Run a quarterly RBAC audit, update your Falco rules when you add new workloads, and re-scan images on a schedule even if they have not changed (new CVEs are disclosed daily). The clusters that get compromised are rarely the ones attacked by sophisticated actors — they are the ones that never closed the basics.