Prometheus Endpoint

On this page Carat arrow pointing down

Each node in a CockroachDB cluster exports granular time-series metrics at two available endpoints:

The metrics are formatted for integration with Prometheus, an open source tool for storing, aggregating, and querying time-series data. For details on how to pull these metrics into Prometheus, refer to Monitor CockroachDB with Prometheus. The Prometheus format is human-readable and can be processed to work with other Prometheus-compatible third-party monitoring systems such as Sysdig and Google Cloud Managed Service for Prometheus. Many of the third-party monitoring integrations, such as Datadog and Kibana, collect metrics from a cluster's Prometheus endpoint.

Note:

In addition to using the exported time-series data to monitor a cluster through an external system, you can write alerting rules to ensure prompt notification of critical events or issues that may require intervention or investigation. Refer to Essential Alerts for more details.

If you rely on external tools for storing and visualizing your cluster's time-series metrics, Cockroach Labs recommends that you disable the DB Console's storage of time-series metrics.

When storage of time-series metrics is disabled, the DB Console Metrics dashboards in the DB Console are still available, but their visualizations are blank. This occurs because the dashboards rely on data that is no longer available.

_status/vars

To access the _status/vars Prometheus endpoint of a cluster running on localhost:8080:

icon/buttons/copy
$ curl http://localhost:8080/_status/vars

The output will be similar to the following. Note that the metric names are unique for sql_*_count*.

# HELP sys_cgocalls Total number of cgo calls
# TYPE sys_cgocalls counter
sys_cgocalls{node_id="1",tenant="demoapp"} 13737
# HELP sys_cpu_sys_percent Current system cpu percentage consumed by the CRDB process
# TYPE sys_cpu_sys_percent gauge
sys_cpu_sys_percent{node_id="1",tenant="demoapp"} 0.0021986027879282717
...
# HELP sql_select_count_internal Number of SQL SELECT statements successfully executed (internal queries)
# TYPE sql_select_count_internal counter
sql_select_count_internal{node_id="1",tenant="demoapp"} 2115
...
# HELP sql_delete_count Number of SQL DELETE statements successfully executed
# TYPE sql_delete_count counter
sql_delete_count{node_id="1",tenant="demoapp"} 0
...
# HELP sql_delete_count_internal Number of SQL DELETE statements successfully executed (internal queries)
# TYPE sql_delete_count_internal counter
sql_delete_count_internal{node_id="1",tenant="demoapp"} 996
...
# HELP sql_select_count Number of SQL SELECT statements successfully executed
# TYPE sql_select_count counter
sql_select_count{node_id="1",tenant="demoapp"} 9
...
# HELP sql_insert_count_internal Number of SQL INSERT statements successfully executed (internal queries)
# TYPE sql_insert_count_internal counter
sql_insert_count_internal{node_id="1",tenant="demoapp"} 1201
...
# HELP sql_update_count Number of SQL UPDATE statements successfully executed
# TYPE sql_update_count counter
sql_update_count{node_id="1",tenant="demoapp"} 0
...
# HELP sql_update_count_internal Number of SQL UPDATE statements successfully executed (internal queries)
# TYPE sql_update_count_internal counter
sql_update_count_internal{node_id="1",tenant="demoapp"} 1907
...
# HELP sql_insert_count Number of SQL INSERT statements successfully executed
# TYPE sql_insert_count counter
sql_insert_count{node_id="1",tenant="system"} 12
sql_insert_count{node_id="1",tenant="demoapp"} 15
...

metrics

New in v25.3:

Note:

This feature is in preview and subject to change. To share feedback and/or issues, contact Support.

The metrics Prometheus endpoint is commonly used and is the default in Prometheus configurations.

To access the metrics Prometheus endpoint of a cluster running on localhost:8080:

icon/buttons/copy
$ curl http://localhost:8080/metrics

The output will be similar to the following. Note that there is one metric name for sql_count, with static labels for query_type (with values of insert, select, update, and delete) and query_internal (with value of true).

# HELP sys_cgocalls Total number of cgo calls
# TYPE sys_cgocalls counter
sys_cgocalls{node_id="1",tenant="demoapp"} 13737
# HELP sys_cpu_sys_percent Current system cpu percentage consumed by the CRDB process
# TYPE sys_cpu_sys_percent gauge
sys_cpu_sys_percent{node_id="1",tenant="demoapp"} 0.0021986027879282717
...
# HELP sql_count Number of SQL INSERT statements successfully executed (internal queries)
# TYPE sql_count counter
sql_count{node_id="1",tenant="demoapp",query_type="insert",query_internal="true"} 1281
sql_count{node_id="1",tenant="demoapp",query_type="delete"} 0
sql_count{node_id="1",tenant="demoapp",query_type="update"} 0
sql_count{node_id="1",tenant="demoapp",query_type="select",query_internal="true"} 2280
sql_count{node_id="1",tenant="demoapp",query_type="select"} 9
sql_count{node_id="1",tenant="demoapp",query_type="insert"} 15
sql_count{node_id="1",tenant="demoapp",query_type="update",query_internal="true"} 2102
sql_count{node_id="1",tenant="demoapp",query_type="delete",query_internal="true"} 1067
...

Static labels

Static labels allow segmentation of a metric across various facets for later querying and aggregation.

Unlabeled metrics from the _status/vars endpoint Labeled metrics from the metrics endpoint
sql_insert_count sql_count{query_type="insert"}
sql_select_count sql_count{query_type="select"}
sql_update_count sql_count{query_type="update"}
sql_delete_count sql_count{query_type="delete"}

At metrics query time, labels provide a smoother user experience:

Unlabeled sum query from the _status/vars endpoint Labeled sum query from the metrics endpoint
sum(sql_insert_count, sql_delete_count, sql_select_count) sum(sql_count)
This query must be modified if new types are added because they will have new metric names. This query is resilient to new type additions.
Related metrics can be found via autocomplete in a third-party tool, but it may be unclear. All label values can be found through a third-party query engine and used to easily construct a graph with individual lines for each label value.

Another common scenario occurs when each label value represents a disjoint set of categories. An example here is the various certificate expiration metrics, which differ only by the specific certificate they refer to. Operators are unlikely to aggregate these, but may still want to view all certificate expiration metrics on a dashboard.

For example, the output from the metrics endpoint will be similar to the following:

# HELP security_certificate_expiration Expiration for the CA certificate
# TYPE security_certificate_expiration gauge
security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="ca"} 1.998766953e+09
security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="ca-client-tenant"} 0
security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="node-client"} 0
security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="client-tenant"} 0
security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="ui"} 0
security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="client"} 1.840654953e+09
security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="client-ca"} 0
security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="ui-ca"} 0
security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="node"} 1.840654953e+09

See also


Yes No
On this page

Yes No