Deploying kube-prometheus-stack with Helm: a practical guide

  ·   6 min read

Overview kube-prometheus-stack is a widely used Helm chart (maintained by the prometheus-community) that bundles Prometheus Operator, Prometheus, Alertmanager, Grafana, exporters, and the Kubernetes manifests (ServiceMonitors, PrometheusRules, etc.) needed for a complete cluster monitoring stack. This article walks through a pragmatic, production-minded deployment using Helm, explains key configuration points, and shows how to extend the stack to scrape your own applications.

Prerequisites

  • A Kubernetes cluster (v1.20+ recommended) with kubectl configured.
  • Helm 3 installed.
  • Cluster-admin or sufficient RBAC privileges to create CRDs and cluster-scoped resources.
  • A storage class for persistent volumes if you want persistence for Prometheus/Alertmanager/Grafana.

Add the Helm repo and update $ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts $ helm repo update

Basic install (quick start) This creates a namespace “monitoring” and installs the chart with default settings: $ helm install kube-prom-stack prometheus-community/kube-prometheus-stack -n monitoring –create-namespace

The chart will install multiple components. To check: $ kubectl get pods -n monitoring $ kubectl get svc -n monitoring

Key components installed

  • Prometheus Operator (operator that manages Prometheus CRs)
  • Prometheus instances (server)
  • Alertmanager
  • Grafana (dashboarding)
  • kube-state-metrics
  • node-exporter
  • PrometheusRule and ServiceMonitor CRs for many Kubernetes components

Important considerations before production deploy

  1. CRDs and upgrades
  • The chart installs CRDs needed by the Prometheus Operator. CRDs are cluster-scoped and managed outside normal Helm templating lifecycle (i.e., Helm won’t upgrade/delete existing CRDs). For production, review CRD changes in chart releases before upgrading.
  • If you want to manage CRDs separately, you can fetch the CRDs directory from the chart and apply them via kubectl apply -f, then install the chart with prometheusOperator.crdCreate=false. But the default behavior is fine for most cases.
  1. Persistence and storage Prometheus and Alertmanager store important state. Enable persistence and set storageClass, size, and retention. Example values for persistence:
prometheus:
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: "rook-block"   # or your cluster's storage class
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: "rook-block"
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
grafana:
  persistence:
    enabled: true
    storageClassName: "rook-block"
    size: 10Gi
  1. Resource requests/limits and replicas Prometheus can be memory/CPU heavy as scrape targets increase. Configure requests/limits and consider remote_write for long-term storage.
prometheus:
  prometheusSpec:
    resources:
      requests:
        memory: 2Gi
        cpu: 1
      limits:
        memory: 4Gi
        cpu: 2
    replicas: 2
  1. Retention and index management Set an appropriate retention period or use remote write to long-term storage.
prometheus:
  prometheusSpec:
    retention: "15d"
    retentionSize: ""
  1. RBAC and security By default the chart enables RBAC needed for scraping. Review role bindings and tighten access where possible. Enable network policies if you require pod-level network control.

Customizing via values.yaml Create values.yaml and pass it to helm install/upgrade. A minimal but practical example:

global:
  rbac:
    create: true

prometheus:
  prometheusSpec:
    replicas: 2
    retention: "15d"
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: "standard"
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
    resources:
      requests:
        memory: 2Gi
        cpu: 1
      limits:
        memory: 4Gi
        cpu: 2

alertmanager:
  alertmanagerSpec:
    replicas: 3
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: "standard"
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi

grafana:
  enabled: true
  adminPassword: "ChangeMeStrongPassword"
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: nginx
    hosts:
      - grafana.example.com
  persistence:
    enabled: true
    storageClassName: "standard"
    size: 10Gi
  sidecar:
    dashboards:
      enabled: true

Installing with custom values: $ helm install kube-prom-stack prometheus-community/kube-prometheus-stack -n monitoring -f values.yaml

Accessing Grafana and Prometheus

  • Grafana:

    • If ingress enabled, open the hostname configured.
    • Or port-forward: $ kubectl port-forward -n monitoring svc/kube-prom-stack-grafana 3000:80
    • Admin credentials: $ kubectl get secret -n monitoring kube-prom-stack-grafana -o jsonpath="{.data.admin-password}" | base64 –decode
  • Prometheus:

    • Port-forward to the Prometheus service: $ kubectl port-forward -n monitoring svc/kube-prom-stack-prometheus 9090:9090

Note: service names may vary by release; use kubectl get svc/pods to discover exact names.

Scraping your own applications: ServiceMonitor and PodMonitor The stack uses ServiceMonitor and PodMonitor CRDs to discover scrape targets. For an application exposing metrics on /metrics, annotate/create a Service and a ServiceMonitor:

Example Service:

apiVersion: v1
kind: Service
metadata:
  name: myapp-metrics
  namespace: myapp
spec:
  selector:
    app: myapp
  ports:
    - name: metrics
      port: 9100
      targetPort: 9100

ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-servicemonitor
  namespace: monitoring     # ServiceMonitor must be in the monitoring namespace (where operator watches) unless configured otherwise
spec:
  selector:
    matchLabels:
      app: myapp
  namespaceSelector:
    matchNames:
      - myapp
  endpoints:
    - port: metrics
      path: /metrics
      interval: 30s
      scrapeTimeout: 10s

Key: Ensure labels on your Service match the selector in the ServiceMonitor. The Operator watches ServiceMonitors and adds configured targets to Prometheus.

Alerting: configuring Alertmanager You can specify alertmanagerConfig via values.yaml or use Secret for more sensitive configs.

Inline simple example:

alertmanager:
  alertmanagerSpec:
    configSecret: alertmanager-main   # alternatively set full config in values.yaml

Create a secret with alertmanager config:

$ kubectl -n monitoring create secret generic alertmanager-main –from-file=alertmanager.yaml=./alertmanager.yaml

Alternatively set alertmanager.config in the Helm values (but keep secrets out of plain values in CI/CD if possible).

Alertmanager receiver example (Slack):

receivers:
- name: 'slack-notifications'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/xxxx/xxxx/xxxx'
    channel: '#alerts'

Security and multi-tenant considerations

  • Limit Grafana access with authentication (LDAP, OAuth via ingress).
  • Use network policies to limit who can access metrics endpoints.
  • Limit RBAC roles for the Prometheus Operator if you want to scope it to specific namespaces.
  • If you need multi-tenant isolation, consider running separate Prometheus instances per tenant or using Thanos for global view + tenant isolation.

Upgrades and maintenance

  • Review changelogs of the Helm chart and CRDs before upgrades.
  • For major chart updates that change CRDs, read the operator/CRD upgrade instructions.
  • Regularly rotate Grafana admin credentials and keep alertmanager secrets secure.
  • Monitor disk usage of Prometheus block storage; set retention or use remote_write to store long-term metrics (Prometheus remote write + Thanos, Cortex, or VictoriaMetrics).

Scaling beyond a single cluster

  • For long-term metrics and global querying, integrate with Thanos (open-source) or Cortex/VictoriaMetrics via remote_write or Thanos sidecar.
  • Thanos can provide object-store-backed long-term storage and global querying across multiple Prometheus instances.

Troubleshooting tips

  • Pods CrashLoop? Check logs: $ kubectl logs -n monitoring
  • No metrics discovered? Verify Service labels and ServiceMonitor selector/namespaceSelector.
  • Grafana login failing? Check the admin password secret and Grafana pod logs for errors.
  • Prometheus high memory? Reduce scrape interval, drop high-cardinality metrics via relabeling, or add more resources/replicas.

Example relabeling to drop a high-cardinality label:

relabel_configs:
  - source_labels: [__meta_kubernetes_pod_label_some_dynamic_label]
    regex: .*
    action: drop

Useful Helm commands

  • Install: $ helm install kube-prom-stack prometheus-community/kube-prometheus-stack -n monitoring -f values.yaml
  • Upgrade: $ helm upgrade kube-prom-stack prometheus-community/kube-prometheus-stack -n monitoring -f values.yaml
  • Uninstall (beware CRDs): $ helm uninstall kube-prom-stack -n monitoring

References and further reading

This guide gives a practical path to install and run kube-prometheus-stack via Helm and highlights the key configuration points for reliable, production deployments.