Write a PrometheusRule¶
AlertingRule¶
Allows you to send alerts to Alertmanager.
Simple example of a configuration file for AlertingRule¶
groups:
- name: client-alerting-rules
rules:
- alert: PodCPUUsage
annotations:
description: Container CPU usage is above 80%
message: CPU usage
help_alerts: <url link to help>
help_alerts2: <url link to help>
expr: (sum(rate(container_cpu_usage_seconds_total{image!="", container_name!=""}[3m])) BY (instance, name, pod_name, container_name, cc_prom_source) * 100) > 80
for: 10m
labels:
severity: critical
AlertingRule format¶
alert: <alert name>
: alert name is in CamelCase (with first letter uppercase too)expr: <PromQL expr>
is the expression in PromQL that we want to evaluate at regular intervalsfor: <duration>
: initial duration while the alert is not showed/notified. This prevents sending notifications if the problem solved itself in a short time. Example : loss of a ping packet. In this case you may want to wait 1 minute before alerting in case that only one network packet was lost.labels: <labels set>
: additionnal alert labels. It allows for example to define the severity label (herecritical
). We can put the labels we want.annotations: <annotations set>
: used to describe the alert withdescription
,message
. You can specify any key for the annotation, an example here withhelp_alerts
andhelp_alerts2
which are urls link to help.
Alerts are grouped into groups
.
In a group alerts will be evaluated at the same time, and will be grouped together when sending notifications.
This is useful when alerts fire because of the same root cause.
RecordingRule¶
Sometimes alert expressions can be complex and expensive.
Recording rules allow you to pre-compute frequently needed or computationally expensive expressions and save their result as a new set of time series.
We can use it as metrics in alert expression or dashboard.
Simple example of a configuration file for RecordingRule¶
groups:
- name: node-exporter.rules
rules:
- expr: |-
(count without (cpu) (
count without (mode) (
node_cpu_seconds_total{job="node-exporter"}
)
)
)
labels:
example: additionallabel
record: instance:node_num_cpu:sum
RecordingRule format¶
expr: <PromQL expr>
is the expression in PromQL that we want to install evaluate at regular intervalslabels: <labels set>
: additional recordingRule labels. We can put the labels we want. Hereexample: additionallabel
will be a label of new time series.record
: name of the metric.
record
name should be of the general form level:metric:operations
.
level
represents the aggregation level and labels of the rule output.metric
is the metric name and should be unchanged other than stripping_total
off counters when usingrate()
orirate()
.operations
is a list of operations that were applied to the metric, newest operation first.