Skip to content

Write a PrometheusRule

AlertingRule

Allows you to send alerts to Alertmanager.

Simple example of a configuration file for AlertingRule

groups:
- name: client-alerting-rules
  rules:
  - alert:  PodCPUUsage
    annotations:
      description: Container CPU usage is above 80%
      message: CPU usage
      help_alerts: <url link to help>
      help_alerts2: <url link to help>
    expr:  (sum(rate(container_cpu_usage_seconds_total{image!="", container_name!=""}[3m])) BY (instance, name, pod_name, container_name, cc_prom_source) * 100) > 80
    for: 10m
    labels:
      severity: critical

AlertingRule format

  • alert: <alert name> : alert name is in CamelCase (with first letter uppercase too)
  • expr: <PromQL expr> is the expression in PromQL that we want to evaluate at regular intervals
  • for: <duration> : initial duration while the alert is not showed/notified. This prevents sending notifications if the problem solved itself in a short time. Example : loss of a ping packet. In this case you may want to wait 1 minute before alerting in case that only one network packet was lost.
  • labels: <labels set> : additionnal alert labels. It allows for example to define the severity label (here critical). We can put the labels we want.
  • annotations: <annotations set> : used to describe the alert with description, message. You can specify any key for the annotation, an example here with help_alerts and help_alerts2 which are urls link to help.

Alerts are grouped into groups.

In a group alerts will be evaluated at the same time, and will be grouped together when sending notifications.

This is useful when alerts fire because of the same root cause.

RecordingRule

Sometimes alert expressions can be complex and expensive.

Recording rules allow you to pre-compute frequently needed or computationally expensive expressions and save their result as a new set of time series.

We can use it as metrics in alert expression or dashboard.

Simple example of a configuration file for RecordingRule

groups:
- name: node-exporter.rules
  rules:
  - expr: |-
      (count without (cpu) (
        count without (mode) (
          node_cpu_seconds_total{job="node-exporter"}
        )
      )
      )
    labels:
      example: additionallabel
    record: instance:node_num_cpu:sum

RecordingRule format

  • expr: <PromQL expr> is the expression in PromQL that we want to install evaluate at regular intervals
  • labels: <labels set> : additional recordingRule labels. We can put the labels we want. Here example: additionallabel will be a label of new time series.
  • record : name of the metric.

record name should be of the general form level:metric:operations.

  • level represents the aggregation level and labels of the rule output.
  • metric is the metric name and should be unchanged other than stripping _total off counters when using rate() or irate().
  • operations is a list of operations that were applied to the metric, newest operation first.