Deploying Thanos and Prometheus on a K8s Cluster

Aurélien Dunand

Oct 5, 2023 • 6 min

This article is the second in a series of 3 on Thanos monitoring.

Part 1: “Thanos: Long-term Storage for Prometheus Metrics”
Part 2: “Deploying Thanos and Prometheus on a K8s Cluster”
Part 3: “Thanos: Aggregating Multiple Prometheus”

Today, it’s time for action: we heat up our Kubernetes cluster and focus on how to deploy Thanos.

Note: Our focus here is on deploying a single Prometheus and Thanos instance on Kubernetes, but Thanos can also be used and deployed outside of Kubernetes.

Deploying Thanos on K8s

We need an object storage for Thanos to store metrics. In this article, we use S3-type storage with a MinIO instance in the K8s cluster.

In real life, utilizing the object storage offered by our cloud provider is equally pertinent.

# We install MinIO
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install minio bitnami/minio --set persistence.enabled=false

# We retrieve the credentials
export ROOT_USER=$(kubectl get secret --namespace monitoring minio -o jsonpath="{.data.root-user}" | base64 -d)
export ROOT_PASSWORD=$(kubectl get secret --namespace monitoring minio -o jsonpath="{.data.root-password}" | base64 -d)

# We create a 'thanos' bucket
kubectl run --namespace monitoring minio-client --rm --tty -i --restart='Never' --env MINIO_SERVER_ROOT_USER=$ROOT_USER --env MINIO_SERVER_ROOT_PASSWORD=$ROOT_PASSWORD --env MINIO_SERVER_HOST=minio --image docker.io/bitnami/minio-client -- mb -p minio/thanos

For the sake of this demonstration and to keep things simple, I am not using here persistent storage. Obviously, do not replicate this in production, or you will see your data disappear in a snap 🙂

Likewise, for the sake of simplicity, I am operating as an admin, but normally a dedicated user with the right permissions is required (for example for S3)

Configuring Thanos Access to Object Storage

Thanos needs to know how to connect to object storage. We create the Secret to use the storage we just created:

export ENDPOINT="minio:9000" BUCKET=thanos
cat > objstore.yml << EOF
type: S3
config:
  endpoint: "${ENDPOINT}"
  bucket: "${BUCKET}"
  access_key: "${ROOT_USER}"
  secret_key: "${ROOT_PASSWORD}"
  insecure: true
EOF
kubectl create secret generic thanos-objstore --from-file=objstore.yml -o yaml --dry-run=client | kubectl apply -f -

Configuring the Thanos Sidecar

Below, I outline two effective options:

via prometheus-operator
via the official Prometheus Helm chart

With the Prometheus-Operator

Custom Resource Prometheus

The prometheus-operator automatically adds the sidecar when filling the thanos section of the Custom Resource Prometheus.

So, you just need to provide the configuration for object storage, with the secret created previously:

spec:
  thanos:
    objectStorageConfig:
      name: thanos-objstore
      key: objstore.yml

If needed, it is possible to configure other elements of the sidecar, like the exact image to use (see the documentation).

Helm Chart Kube-Prometheus-Stack

The kube-prometheus-stack chart not only allows for the configuration of the Custom Resource Prometheus, but also adds a ServiceMonitor to fetch metrics from the sidecar, using the following values:

prometheus:
  prometheusSpec:
    thanos:
      objectStorageConfig:
        name: thanos-objstore
        key: objstore.yml
  thanosService:
    enabled: true
  thanosServiceMonitor:
    enabled: true

Via the Official Prometheus Helm Chart

In the rare case where you’re not utilizing the prometheus-operator in the cluster but rather the Prometheus chart, configuring the sidecar is also possible. This requires slightly longer values:

server:
  extraFlags:
    - web.enable-lifecycle
    - web.enable-admin-api
    - storage.tsdb.min-block-duration=2h
    - storage.tsdb.max-block-duration=2h

  global:
    external_labels:
      prometheus_from: myprom

  service:
    gRPC:
      enabled: true

  sidecarContainers:
    thanos-sidecar:
      image: quay.io/thanos/thanos:v0.31.0
      imagePullPolicy: IfNotPresent
      args:
        - sidecar
        - --prometheus.url=http://localhost:9090/
        - --tsdb.path=/prometheus
        - --grpc-address=[$(POD_IP)]:10901
        - --http-address=[$(POD_IP)]:10902
        - --objstore.config=$(OBJSTORE_CONFIG)
      volumeMounts:
        - mountPath: /prometheus
          name: storage-volume
      env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        - name: OBJSTORE_CONFIG
          valueFrom:
            secretKeyRef:
              name: thanos-objstore
              key: objstore.yml
      ports:
        - name: http
          containerPort: 10902
          protocol: TCP
        - name: grpc
          containerPort: 10901
          protocol: TCP

Prometheus requires several configurations:

Enable the API so that the sidecar can reload in case of configuration changes
Enable the admin API to retrieve metadata such as external_labels
Ensure that the blocks generated by Prometheus are not compacted (min == max duration)
Have at least one external label to identify the Prometheus instance

Next, set up the Thanos sidecar and ensure it’s exposed through the service.

Deploying Other Thanos Components

We will use the comprehensive bitnami/thanos chart. We use the following values:

existingObjstoreSecret: thanos-objstore

query:
  dnsDiscovery:
    sidecarsService: "prometheus-operated"
    sidecarsNamespace: "monitoring"

compactor:
  enabled: true

storegateway:
  enabled: true

metrics:
  enabled: true
  serviceMonitor:
    enabled: true

Then deploy with:

helm upgrade --install thanos bitnami/thanos --values thanos.values.yaml

This results in four Thanos components being installed:

query, which will use DNS service discovery to connect to the StoreAPI of the sidecar and storegateway
query-frontend, configured to target query
storegateway, exposing the content of the object storage
compactor, handling compaction and retention in the object storage

By default, the compactor and storegateway require a persistent volume. In reality, they can however do without, the startup will just be longer for the storegateway (the time to synchronize the metadata from the object storage) and the compactor will work on ephemeral-storage (be careful to set resources limits in this case). For simplicity of the demo, I will disable it:

compactor:
  enabled: true
  persistence:
    enabled: false

storegateway:
  enabled: true
  persistence:
    enabled: false

The chart is flexible enough to configure the various Thanos parameters (cache, retention duration, …) and Kubernetes manifests (ingress, resources, affinity/taint, …).

An example of configuration:

query:
  replicaCount: 3
  replicaLabel: prometheus_replica
  podAntiAffinityPreset: hard
  pdb:
    create: true
  existingSDConfigmap: thanos-storeapi-file-sd
  extraFlags:
    - --query.promql-engine=thanos

queryFrontend:
  replicaCount: 2
  podAntiAffinityPreset: hard
  pdb:
    create: true
  extraFlags:
    - '--query-frontend.downstream-tripper-config="max_idle_conns_per_host": 100'
  config: |-
    type: REDIS
    config:
      addr: 'redis:6379'    

compactor:
  enabled: true
  retentionResolutionRaw: 90d
  retentionResolution5m: 180d
  retentionResolution1h: 2y

storegateway:
  enabled: true
  replicaCount: 2
  podAntiAffinityPreset: hard
  pdb:
    create: true
  config: |-
    type: REDIS
    config:
      addr: 'redis:6379'
      cache_size: '1G'

With these values, we have:

The query, query-frontend, and storegateway with multiple replicas, anti-affinity (requiredDuringScheduling) between them, and PodDisruptionBudgets
The choice of the label that differentiates the HA Prometheus replicas on the query (replicaLabel)
The use of a ConfigMap with the list of StoreAPI for the query’s service discovery
Additional arguments for the query and query-frontend to configure certain options
The configuration of Redis as a cache for the query-frontend and the storegateway
The retention durations of different metrics applied by the compactor

Using Thanos Instead of Prometheus

For PromQL Queries via WebUI

The Thanos query provides an interface similar to Prometheus. It can be accessed via port-forwarding or by configuring an ingress. If the query-frontend is used, it is in its values that the ingress must be configured:

queryFrontend:
  ingress:
    enabled: true
    hostname: thanos.example.com
    ingressClassName: public

We find the capability to perform PromQL queries, just like with Prometheus:

Thanos Query WebUI

The alerts, targets, and rules from different Prometheus instances are also accessible:

Thanos Query WebUI status menu

And one can list the different configured StoreAPIs on the query:

Thanos Query WebUI StoreAPI listing

Thanos as a Datasource for Grafana

Thanos is most frequently used in conjunction with Grafana. Thanos query exposes the same API as Prometheus, so all it takes is to add a Prometheus type datasource in Grafana and target the Thanos query (or query-frontend if deployed):

Add Thanos via a Grafana datasource

It is possible to indicate to Grafana the type of Prometheus datasource and provide parameters for queries. For example, to specify that you want to retrieve the automatically downsampled versions of the metrics based on the resolution:

Thanos datasource detail

The final step is to employ it as a datasource, either for a panel or during a dashboard import:

Selecting datasource for Grafana panel

Or even better, configure the new datasource as the default one, since Thanos query allows having multiple queryable Prometheus instances behind it.

Default Grafana datasource

The concluding article in our Thanos series delves into the scenarios where it’s pertinent to expose multiple Prometheus instances behind Thanos.

Do not miss our latest DevOps and Cloud Native blogposts! Follow Enix on Linkedin!