Monitoring
Transiter includes Prometheus metrics for monitoring a running deployment. The main thing you are likely interested in is the success of feed updates. If feed updates are failing, the data being served by your Transiter instance will be stale.
This page describes the most interesting metrics. All of the metrics are defined in this Go source file If you have an idea for a new metric, please file an issue or create a PR!
Feed update metrics
All of these metrics contain at least 3 labels: system_id
, feed_id
and feed_type
.
The feed type (e.g. GTFS_STATIC
, GTFS_REALTIME
, etc.)
can be useful for creating graphs that just target realtime feeds.
Last completed feed update
The gauge metric transiter_feed_update_last_completed
reports the Unix timestamp of
when the last feed update completed.
This is partitioned by a label result
which can be either SUCCESS
, FAILURE
or SKIPPED
.
Feed updates are skipped when the transit agency's data hasn't changed.
Example graphs
Number of seconds since each realtime feed last successfully updated:
time() - (max by (system_id, feed_id) (transiter_feed_update_last_completed{feed_type="GTFS_REALTIME", result!="FAILED"})) / 1000
Number of seconds since new data was obtained for each realtime feed:
time() - (max by (system_id, feed_id) (transiter_feed_update_last_completed{feed_type="GTFS_REALTIME", result="SUCCESS"})) / 1000
Number of feed updates, partitioned by status
The counter metric transiter_feed_update_count
reports the number of feed updates.
It has a status
label which reports the status of the feed update.
This can be SUCCESS
, SKIPPED
or another status like FAILED_DOWNLOAD_ERROR
which denotes a specific kind of error.
Example graph: number of failed feed updates per minute:
rate(transiter_feed_update_count{status!="UPDATED", status!="SKIPPED"}[5m]) * 60
Latency of successful feed updates
The distribution metric transiter_feed_update_success_latency
reports how long successful feed updates took.
Like all Go Prometheus distribution metrics it has two counters associated with it:
transiter_feed_update_success_latency_count
: number of data points counter, which counts the number of successful feed updates.transiter_feed_update_success_latency_sum
: sum of all latencies for all successful feed updates.
Example graphs
Successful feed updates per minute:
rate(transiter_feed_update_success_latency_count[5m]) * 60
Average update time over a 5 minute window:
rate(transiter_feed_update_success_latency_sum[5m]) / rate(transiter_feed_update_success_latency_count[5m])
API response metric
There is currently one distribution metric transiter_public_request_latency_bucket
which reports the latency of public API responses.
Note this latency only counts the time Transiter takes to handle the request.
It does not count how long it takes the request to reach the server,
transit through any reverse proxies,
be parsed by the Go HTTP package etc.
Example graphs
50th, 90th and 95th percentile response times for the "get stop" endpoint:
histogram_quantile(0.5, sum(rate(transiter_public_request_latency_bucket{method_name="GetStop"}[5m])) by (le))
histogram_quantile(0.9, sum(rate(transiter_public_request_latency_bucket{method_name="GetStop"}[5m])) by (le))
histogram_quantile(0.95, sum(rate(transiter_public_request_latency_bucket{method_name="GetStop"}[5m])) by (le))
Requests per second by method:
rate(transiter_public_request_latency_count[1m])
Request errors per minute:
rate(transiter_public_request_latency_count{response_status != "OK"}[10m]) * 60