Overview kube-prometheus-stack is a widely used Helm chart (maintained by the prometheus-community) that bundles Prometheus Operator, Prometheus, Alertmanager, Grafana, exporters, and the Kubernetes manifests (ServiceMonitors, PrometheusRules, etc.) needed for a complete cluster monitoring stack. This article walks through a pragmatic, production-minded deployment using Helm, explains key configuration points, and shows how to extend the stack to scrape your own applications.
Prerequisites
- A Kubernetes cluster (v1.20+ recommended) with kubectl configured.
- Helm 3 installed.
- Cluster-admin or sufficient RBAC privileges to create CRDs and cluster-scoped resources.
- A storage class for persistent volumes if you want persistence for Prometheus/Alertmanager/Grafana.
Add the Helm repo and update $ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts $ helm repo update
Basic install (quick start) This creates a namespace “monitoring” and installs the chart with default settings: $ helm install kube-prom-stack prometheus-community/kube-prometheus-stack -n monitoring –create-namespace
The chart will install multiple components. To check:
$ kubectl get pods -n monitoring
$ kubectl get svc -n monitoring
Key components installed
- Prometheus Operator (operator that manages Prometheus CRs)
- Prometheus instances (server)
- Alertmanager
- Grafana (dashboarding)
- kube-state-metrics
- node-exporter
- PrometheusRule and ServiceMonitor CRs for many Kubernetes components
Important considerations before production deploy
- CRDs and upgrades
- The chart installs CRDs needed by the Prometheus Operator. CRDs are cluster-scoped and managed outside normal Helm templating lifecycle (i.e., Helm won’t upgrade/delete existing CRDs). For production, review CRD changes in chart releases before upgrading.
- If you want to manage CRDs separately, you can fetch the CRDs directory from the chart and apply them via kubectl apply -f, then install the chart with prometheusOperator.crdCreate=false. But the default behavior is fine for most cases.
- Persistence and storage Prometheus and Alertmanager store important state. Enable persistence and set storageClass, size, and retention. Example values for persistence:
prometheus:
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: "rook-block" # or your cluster's storage class
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: "rook-block"
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
grafana:
persistence:
enabled: true
storageClassName: "rook-block"
size: 10Gi
- Resource requests/limits and replicas Prometheus can be memory/CPU heavy as scrape targets increase. Configure requests/limits and consider remote_write for long-term storage.
prometheus:
prometheusSpec:
resources:
requests:
memory: 2Gi
cpu: 1
limits:
memory: 4Gi
cpu: 2
replicas: 2
- Retention and index management Set an appropriate retention period or use remote write to long-term storage.
prometheus:
prometheusSpec:
retention: "15d"
retentionSize: ""
- RBAC and security By default the chart enables RBAC needed for scraping. Review role bindings and tighten access where possible. Enable network policies if you require pod-level network control.
Customizing via values.yaml Create values.yaml and pass it to helm install/upgrade. A minimal but practical example:
global:
rbac:
create: true
prometheus:
prometheusSpec:
replicas: 2
retention: "15d"
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: "standard"
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
resources:
requests:
memory: 2Gi
cpu: 1
limits:
memory: 4Gi
cpu: 2
alertmanager:
alertmanagerSpec:
replicas: 3
storage:
volumeClaimTemplate:
spec:
storageClassName: "standard"
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
grafana:
enabled: true
adminPassword: "ChangeMeStrongPassword"
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
hosts:
- grafana.example.com
persistence:
enabled: true
storageClassName: "standard"
size: 10Gi
sidecar:
dashboards:
enabled: true
Installing with custom values: $ helm install kube-prom-stack prometheus-community/kube-prometheus-stack -n monitoring -f values.yaml
Accessing Grafana and Prometheus
-
Grafana:
- If ingress enabled, open the hostname configured.
- Or port-forward: $ kubectl port-forward -n monitoring svc/kube-prom-stack-grafana 3000:80
- Admin credentials: $ kubectl get secret -n monitoring kube-prom-stack-grafana -o jsonpath="{.data.admin-password}" | base64 –decode
-
Prometheus:
- Port-forward to the Prometheus service: $ kubectl port-forward -n monitoring svc/kube-prom-stack-prometheus 9090:9090
Note: service names may vary by release; use kubectl get svc/pods to discover exact names.
Scraping your own applications: ServiceMonitor and PodMonitor The stack uses ServiceMonitor and PodMonitor CRDs to discover scrape targets. For an application exposing metrics on /metrics, annotate/create a Service and a ServiceMonitor:
Example Service:
apiVersion: v1
kind: Service
metadata:
name: myapp-metrics
namespace: myapp
spec:
selector:
app: myapp
ports:
- name: metrics
port: 9100
targetPort: 9100
ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp-servicemonitor
namespace: monitoring # ServiceMonitor must be in the monitoring namespace (where operator watches) unless configured otherwise
spec:
selector:
matchLabels:
app: myapp
namespaceSelector:
matchNames:
- myapp
endpoints:
- port: metrics
path: /metrics
interval: 30s
scrapeTimeout: 10s
Key: Ensure labels on your Service match the selector in the ServiceMonitor. The Operator watches ServiceMonitors and adds configured targets to Prometheus.
Alerting: configuring Alertmanager You can specify alertmanagerConfig via values.yaml or use Secret for more sensitive configs.
Inline simple example:
alertmanager:
alertmanagerSpec:
configSecret: alertmanager-main # alternatively set full config in values.yaml
Create a secret with alertmanager config:
$ kubectl -n monitoring create secret generic alertmanager-main –from-file=alertmanager.yaml=./alertmanager.yaml
Alternatively set alertmanager.config in the Helm values (but keep secrets out of plain values in CI/CD if possible).
Alertmanager receiver example (Slack):
receivers:
- name: 'slack-notifications'
slack_configs:
- api_url: 'https://hooks.slack.com/services/xxxx/xxxx/xxxx'
channel: '#alerts'
Security and multi-tenant considerations
- Limit Grafana access with authentication (LDAP, OAuth via ingress).
- Use network policies to limit who can access metrics endpoints.
- Limit RBAC roles for the Prometheus Operator if you want to scope it to specific namespaces.
- If you need multi-tenant isolation, consider running separate Prometheus instances per tenant or using Thanos for global view + tenant isolation.
Upgrades and maintenance
- Review changelogs of the Helm chart and CRDs before upgrades.
- For major chart updates that change CRDs, read the operator/CRD upgrade instructions.
- Regularly rotate Grafana admin credentials and keep alertmanager secrets secure.
- Monitor disk usage of Prometheus block storage; set retention or use remote_write to store long-term metrics (Prometheus remote write + Thanos, Cortex, or VictoriaMetrics).
Scaling beyond a single cluster
- For long-term metrics and global querying, integrate with Thanos (open-source) or Cortex/VictoriaMetrics via remote_write or Thanos sidecar.
- Thanos can provide object-store-backed long-term storage and global querying across multiple Prometheus instances.
Troubleshooting tips
- Pods CrashLoop? Check logs: $ kubectl logs -n monitoring
- No metrics discovered? Verify Service labels and ServiceMonitor selector/namespaceSelector.
- Grafana login failing? Check the admin password secret and Grafana pod logs for errors.
- Prometheus high memory? Reduce scrape interval, drop high-cardinality metrics via relabeling, or add more resources/replicas.
Example relabeling to drop a high-cardinality label:
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_some_dynamic_label]
regex: .*
action: drop
Useful Helm commands
- Install: $ helm install kube-prom-stack prometheus-community/kube-prometheus-stack -n monitoring -f values.yaml
- Upgrade: $ helm upgrade kube-prom-stack prometheus-community/kube-prometheus-stack -n monitoring -f values.yaml
- Uninstall (beware CRDs): $ helm uninstall kube-prom-stack -n monitoring
References and further reading
- prometheus-community/kube-prometheus-stack (Helm chart): https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
- Chart documentation on Artifact Hub: https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack
- Prometheus Operator docs: https://github.com/prometheus-operator/prometheus-operator
- Prometheus docs (storage, relabeling): https://prometheus.io/docs/
- Grafana docs: https://grafana.com/docs/
- Thanos for long-term storage and global query: https://thanos.io/
This guide gives a practical path to install and run kube-prometheus-stack via Helm and highlights the key configuration points for reliable, production deployments.