Cron monitoring not working as expected
Self-Hosted Version
24.7.0
CPU Architecture
x86_64
Docker Version
27.0.3
Docker Compose Version
2.28.1
Steps to Reproduce
I've created a new cron monitor with the schedule (*/10 * * * 1-6) and added a HTTP GET request with curl in out Windows Task Scheduler task (status=in_progress on start and status=ok on finish).
For testing purposes I sent the requests to our self hosted instance and sentry.io
Expected Result
On sentry.io the result looks like this:
Actual Result
On our self hosted instance, the ok status doesn't connect with the in_progress one:
I tried it with a check_in_id, but the result was the same.
We have other monitors which don't run every 10 minutes and they don't have this problem. Maybe there is a problem in the database? We had the postgres alpine image, but could reindex.
We've never used the cron monitor feature before, so I don't know if it did work on older versions.
Logs:
worker-1 | 08:18:00 [INFO] sentry.rules.delayed_processing: delayed_processing.project_id_list (project_ids='')
worker-1 | 08:19:00 [INFO] sentry.rules.delayed_processing: delayed_processing.project_id_list (project_ids='')
ingest-monitors-1 | 08:19:01 [INFO] sentry: monitors.consumer.clock_tick (reference_datetime='2024-07-18 08:19:00+00:00')
ingest-monitors-1 | 08:19:01 [INFO] sentry: monitors.consumer.clock_tick_backfill (reference_datetime='2024-07-18 08:16:00+00:00')
ingest-monitors-1 | 08:19:01 [INFO] sentry: monitors.consumer.clock_tick_backfill (reference_datetime='2024-07-18 08:17:00+00:00')
ingest-monitors-1 | 08:19:01 [INFO] sentry: monitors.consumer.clock_tick_backfill (reference_datetime='2024-07-18 08:18:00+00:00')
monitors-clock-tick-1 | 08:19:01 [INFO] sentry.monitors.consumers.clock_tick_consumer: process_clock_tick (reference_datetime='2024-07-18 08:16:00+00:00')
monitors-clock-tick-1 | 08:19:01 [INFO] sentry.monitors.consumers.clock_tick_consumer: process_clock_tick (reference_datetime='2024-07-18 08:17:00+00:00')
monitors-clock-tick-1 | 08:19:01 [INFO] sentry.monitors.consumers.clock_tick_consumer: process_clock_tick (reference_datetime='2024-07-18 08:18:00+00:00')
monitors-clock-tick-1 | 08:19:01 [INFO] sentry.monitors.consumers.clock_tick_consumer: process_clock_tick (reference_datetime='2024-07-18 08:19:00+00:00')
worker-1 | 08:20:00 [INFO] sentry.dynamic_sampling.tasks.boost_low_volume_projects: boost_low_volume_projects (traceparent='4579702a6ce84c8885b198718c042963-96dc4fd43e990201-0' baggage='sentry-trace_id=4579702a6ce84c8885b198718c042963,sentry-environment=production,sentry-release=backend%4024.7.0%2Bunknown')
worker-1 | 08:20:00 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.
worker-1 | 08:20:00 [INFO] sentry.rules.delayed_processing: delayed_processing.project_id_list (project_ids='')
worker-1 | 08:20:00 [INFO] sentry.tasks.auto_ongoing_issues: auto_transition_issue_states queue size greater than 0. (size=3 task='schedule_auto_resolution')
worker-1 | 08:20:00 [INFO] sentry.tasks.auto_ongoing_issues: auto_transition_issue_states queue size greater than 0. (size=2 task='schedule_auto_transition_issues_new_to_ongoing')
worker-1 | 08:20:00 [INFO] sentry.tasks.auto_ongoing_issues: auto_transition_issues_new_to_ongoing started (first_seen_lte=1720686000 first_seen_lte_datetime=datetime.datetime(2024, 7, 11, 8, 20, tzinfo=datetime.timezone.utc))
worker-1 | 08:20:00 [INFO] sentry.dynamic_sampling.tasks.boost_low_volume_projects: boost_low_volume_projects_of_org (traceparent='4579702a6ce84c8885b198718c042963-9543c31dbcde5f27-0' baggage='sentry-trace_id=4579702a6ce84c8885b198718c042963,sentry-environment=production,sentry-release=backend%4024.7.0%2Bunknown')
nginx-1 | #.#.#.# - - [18/Jul/2024:08:20:00 +0000] "GET /api/22/cron/imp_lbase_ebel/10aad4be6e25ad9f0e0b0b8c01cd8f32/?status=in_progress HTTP/1.0" 202 0 "-" "curl/8.7.1" "#.#.#.#"
ingest-monitors-1 | 08:20:01 [INFO] sentry: monitors.consumer.clock_tick (reference_datetime='2024-07-18 08:20:00+00:00')
monitors-clock-tick-1 | 08:20:01 [INFO] sentry.monitors.consumers.clock_tick_consumer: process_clock_tick (reference_datetime='2024-07-18 08:20:00+00:00')
nginx-1 | #.#.#.# - - [18/Jul/2024:08:20:04 +0000] "GET /api/22/cron/imp_lbase_ebel/10aad4be6e25ad9f0e0b0b8c01cd8f32/?status=ok HTTP/1.0" 202 0 "-" "curl/8.7.1" "#.#.#.#"
postgres-1 | 2024-07-18 08:20:05.266 UTC [526] ERROR: duplicate key value violates unique constraint "sentry_environmentprojec_project_id_environment_i_91da82f2_uniq"
postgres-1 | 2024-07-18 08:20:05.266 UTC [526] DETAIL: Key (project_id, environment_id)=(22, 2) already exists.
postgres-1 | 2024-07-18 08:20:05.266 UTC [526] STATEMENT: INSERT INTO "sentry_environmentproject" ("project_id", "environment_id", "is_hidden") VALUES (22, 2, NULL) RETURNING "sentry_environmentproject"."id"
worker-1 | 08:21:00 [INFO] sentry.rules.delayed_processing: delayed_processing.project_id_list (project_ids='')
redis-1 | 1:M 18 Jul 2024 08:21:55.728 * 10000 changes in 60 seconds. Saving...
redis-1 | 1:M 18 Jul 2024 08:21:55.729 * Background saving started by pid 899
redis-1 | 899:C 18 Jul 2024 08:21:55.776 * DB saved on disk
redis-1 | 899:C 18 Jul 2024 08:21:55.776 * RDB: 0 MB of memory used by copy-on-write
redis-1 | 1:M 18 Jul 2024 08:21:55.829 * Background saving terminated with success
worker-1 | 08:22:00 [INFO] sentry.rules.delayed_processing: delayed_processing.project_id_list (project_ids='')
ingest-monitors-1 | 08:22:01 [INFO] sentry: monitors.consumer.clock_tick (reference_datetime='2024-07-18 08:22:00+00:00')
ingest-monitors-1 | 08:22:01 [INFO] sentry: monitors.consumer.clock_tick_backfill (reference_datetime='2024-07-18 08:21:00+00:00')
monitors-clock-tick-1 | 08:22:01 [INFO] sentry.monitors.consumers.clock_tick_consumer: process_clock_tick (reference_datetime='2024-07-18 08:21:00+00:00')
monitors-clock-tick-1 | 08:22:01 [INFO] sentry.monitors.consumers.clock_tick_consumer: process_clock_tick (reference_datetime='2024-07-18 08:22:00+00:00')
monitors-clock-tasks-1 | 08:22:01 [INFO] sentry.monitors.clock_tasks.check_timeout: checkin_timeout (checkin_id=278)
nginx-1 | #.#.#.# - - [18/Jul/2024:08:22:43 +0000] "POST /api/1/envelope/?sentry_key=2f3ee29400cc47c19f6a705b5cdd8523&sentry_version=7&sentry_client=sentry.javascript.react%2F8.12.0-beta.0 HTTP/1.0" 200 2 "https://sentry.rmgruppe.de/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0" "#.#.#.#"
worker-1 | 08:23:00 [INFO] sentry.rules.delayed_processing: delayed_processing.project_id_list (project_ids='')
ingest-monitors-1 | 08:23:01 [INFO] sentry: monitors.consumer.clock_tick (reference_datetime='2024-07-18 08:23:00+00:00')
monitors-clock-tick-1 | 08:23:01 [INFO] sentry.monitors.consumers.clock_tick_consumer: process_clock_tick (reference_datetime='2024-07-18 08:23:00+00:00')
Event ID
No response
Sometimes it does work:
Logs:
nginx-1 | #.#.#.# - - [18/Jul/2024:10:00:00 +0000] "GET /api/22/cron/imp_lbase_ebel/10aad4be6e25ad9f0e0b0b8c01cd8f32/?status=in_progress HTTP/1.0" 202 0 "-" "curl/8.7.1" #.#.#.#"
nginx-1 | #.#.#.# - - [18/Jul/2024:10:00:01 +0000] "POST /api/6/envelope/ HTTP/1.0" 200 2 "-" "-" "#.#.#.#"
ingest-monitors-1 | 10:00:04 [INFO] sentry: monitors.consumer.clock_tick (reference_datetime='2024-07-18 10:00:00+00:00')
monitors-clock-tick-1 | 10:00:04 [INFO] sentry.monitors.consumers.clock_tick_consumer: process_clock_tick (reference_datetime='2024-07-18 10:00:00+00:00')
nginx-1 | #.#.#.# - - [18/Jul/2024:10:00:09 +0000] "GET /api/22/cron/imp_lbase_ebel/10aad4be6e25ad9f0e0b0b8c01cd8f32/?status=ok HTTP/1.0" 202 0 "-" "curl/8.7.1" "#.#.#.#"
worker-1 | 10:01:00 [INFO] sentry.rules.delayed_processing: delayed_processing.project_id_list (project_ids='')
ingest-monitors-1 | 10:01:01 [INFO] sentry: monitors.consumer.clock_tick (reference_datetime='2024-07-18 10:01:00+00:00')
monitors-clock-tick-1 | 10:01:01 [INFO] sentry.monitors.consumers.clock_tick_consumer: process_clock_tick (reference_datetime='2024-07-18 10:01:00+00:00')
worker-1 | 10:02:00 [INFO] sentry.rules.delayed_processing: delayed_processing.project_id_list (project_ids='')
ingest-monitors-1 | 10:02:01 [INFO] sentry: monitors.consumer.clock_tick (reference_datetime='2024-07-18 10:02:00+00:00')
monitors-clock-tick-1 | 10:02:01 [INFO] sentry.monitors.consumers.clock_tick_consumer: process_clock_tick (reference_datetime='2024-07-18 10:02:00+00:00')
worker-1 | 10:03:00 [INFO] sentry.rules.delayed_processing: delayed_processing.project_id_list (project_ids='')
ingest-monitors-1 | 10:03:01 [INFO] sentry: monitors.consumer.clock_tick (reference_datetime='2024-07-18 10:03:00+00:00')
monitors-clock-tick-1 | 10:03:01 [INFO] sentry.monitors.consumers.clock_tick_consumer: process_clock_tick (reference_datetime='2024-07-18 10:03:00+00:00')
redis-1 | 1:M 18 Jul 2024 10:03:45.712 * 10000 changes in 60 seconds. Saving...
redis-1 | 1:M 18 Jul 2024 10:03:45.713 * Background saving started by pid 2160
redis-1 | 2160:C 18 Jul 2024 10:03:45.786 * DB saved on disk
redis-1 | 2160:C 18 Jul 2024 10:03:45.786 * RDB: 0 MB of memory used by copy-on-write
redis-1 | 1:M 18 Jul 2024 10:03:45.813 * Background saving terminated with success
worker-1 | 10:04:00 [INFO] sentry.rules.delayed_processing: delayed_processing.project_id_list (project_ids='')
ingest-monitors-1 | 10:04:01 [INFO] sentry: monitors.consumer.clock_tick (reference_datetime='2024-07-18 10:04:00+00:00')
monitors-clock-tick-1 | 10:04:01 [INFO] sentry.monitors.consumers.clock_tick_consumer: process_clock_tick (reference_datetime='2024-07-18 10:04:00+00:00')
If you were able to reindex your database, I believe your data should be fine and duplicate indexes should not be an issue here. I'm going to forward this to crons to see if they have more input
Assigning to @getsentry/support for routing ⏲️
Routing to @getsentry/product-owners-crons for triage ⏲️
@fmdoe Are there any duplicate rows in sentry_environment?
@wedamija no, every environment name is unique. But I think the database reindex would have failed if there were any duplicate keys
Hey @fmdoe, our team is taking a look and we'll get back to you soon with our findings.
Hi @fmdoe
Sorry for the delay in response here. I've had a look at your logs and screenshots, and while I'm not totally sure what's wrong, it looks like there's some kind of mismatch where the in_progress is being attributed to the wrong run in some cases.
One easy way to avoid this is to use check-in ids - this allows you to associate a uuid to a given run, so that there's no chance of a check-in being misattributed.
To do so, you can follow the docs here: https://docs.sentry.io/product/crons/getting-started/http/#overlapping-jobs-optional. While these docs mostly talk about overlapping jobs, this feature should also help you out here.
@wedamija Thanks for the response. With a progress id the check-ins don't time out anymore, but often there is no duration:
I think these are cases where the in_progress and ok didn't match before.
On sentry.io:
Hey @fmdoe,
We've seen this behavior when both check-ins (the in_progress and the ok ones) are sent around the same time, usually because the duration was extremely short.
For these types of jobs that run very fast, we recommend manually specifying the duration, you can just provide an extra query string to the request:
curl "${SENTRY_CRONS}?check_in_id=${CHECK_IN_ID}&status=ok&duration=1"
Duration should be in seconds. Please let me know if that works for you.