Sentry metrics appear to be sampled and therefore low-volume metrics frequently don't appear at Sentry servers
Environment
SaaS (https://sentry.io/)
Steps to Reproduce
Run a Java service with Sentry SDK v7.9.0 Have a piece of code that logs higher-volume metrics and lower-volume metrics Notice that while higher-volume metrics generally appear in Sentry dashboards, lower-volume metrics rarely make it.
Expected Result
Both high-volume and low-volume metrics are sent and are not sampled on the client side.
Actual Result
High volume metrics appear in the dashboard, although at much lower rates than the incidence of those metrics. Low volume metrics frequently don't appear on the dashboard nor in the list of emitted metrics under "Settings > {Org} > {Project} > Metrics". Sometimes those low-volume metrics appear days after the initial metric has been emitted.
Additionally, both types of metrics appear to be sampled (on the client?) since the values seen in the Sentry dashboards are much lower than from estimating event incidence by our logs.
Here's a sample of the events for Google Drive API "exportSizeLimitExceeded" error from our logs:
(base) backend-server-dev:~/backend-server-dev$ zgrep -c exportSizeLimitExceeded logs/backend.2024-0* | egrep -v ':0$'
logs/backend.2024-05-29.0.log.zip:6
logs/backend.2024-05-31.0.log.zip:4
logs/backend.2024-06-03.0.log.zip:4
logs/backend.2024-06-04.0.log.zip:2
Whereas here's the screenshot of the Sentry dashboard showing that metrics for the last 14d -- note that it only shows up once on May 31st
Likewise here are some counts from our preview environment of the number of files transferred via the Google Drive API from logs vs. the Sentry metric; note that the max counts/day are 200+ for the high-volume days where as our max value is 71 for that date range.
(base) backend-server-dev:~/backend-server-dev$ zgrep -c 'Transferring file' logs/backend.2024-0* |egrep -v ':0$'
logs/backend.2024-05-31.0.log.zip:257
logs/backend.2024-06-03.0.log.zip:217
logs/backend.2024-06-04.0.log.zip:30
logs/backend.2024-06-05.0.log.zip:21
logs/backend.2024-06-07.0.log.zip:44
logs/backend.2024-06-11.0.log.zip:105
logs/backend.2024-06-12.0.log.zip:128
Note that I've run the Java SDK with debug enabled, and the only messages re: metrics I see are the following types of messages; per your support team I was told I should see the names of captured metrics here but that doesn't appear to be the case for any of the environments I've enabled Sentry debug in:
DEBUG: Metrics: nothing to flush
DEBUG: Metrics: nothing to flush
DEBUG: Metrics: flushing 1 buckets
DEBUG: Metrics: capturing metrics
DEBUG: Envelope sent successfully.
DEBUG: Envelope flushed
Product Area
DDM
Link
https://memyard.sentry.io/metrics/?interval=1d&project=4507091072385024&statsPeriod=30d&utc=true&widgets=%5B%7B%22mri%22%3A%22c%3Acustom%2Fgoogle_drive.transfer_file.start%40none%22%2C%22op%22%3A%22sum%22%2C%22query%22%3A%22%22%2C%22groupBy%22%3A%5B%5D%2C%22powerUserMode%22%3Afalse%2C%22type%22%3A1%2C%22id%22%3A0%2C%22displayType%22%3A%22line%22%2C%22focusedSeries%22%3A%5B%5D%2C%22sort%22%3A%7B%22order%22%3A%22asc%22%7D%2C%22isHidden%22%3Afalse%2C%22overlays%22%3A%5B%22samples%22%5D%7D%2C%7B%22mri%22%3A%22c%3Acustom%2Fgoogle_drive.transfer_file.success%40none%22%2C%22op%22%3A%22sum%22%2C%22query%22%3A%22%22%2C%22groupBy%22%3A%5B%5D%2C%22powerUserMode%22%3Afalse%2C%22type%22%3A1%2C%22id%22%3A1%2C%22displayType%22%3A%22line%22%2C%22focusedSeries%22%3A%5B%5D%2C%22sort%22%3A%7B%22order%22%3A%22asc%22%7D%2C%22isHidden%22%3Afalse%2C%22overlays%22%3A%5B%22samples%22%5D%7D%2C%7B%22mri%22%3A%22c%3Acustom%2Fgoogle_drive.transfer_file.failure%40none%22%2C%22op%22%3A%22sum%22%2C%22query%22%3A%22%22%2C%22groupBy%22%3A%5B%5D%2C%22powerUserMode%22%3Afalse%2C%22type%22%3A1%2C%22id%22%3A2%2C%22displayType%22%3A%22line%22%2C%22focusedSeries%22%3A%5B%5D%2C%22sort%22%3A%7B%22order%22%3A%22asc%22%7D%2C%22isHidden%22%3Afalse%2C%22overlays%22%3A%5B%22samples%22%5D%7D%2C%7B%22mri%22%3A%22d%3Acustom%2Fstorage_service.train_drive.available_files%40count%22%2C%22op%22%3A%22sum%22%2C%22query%22%3A%22%22%2C%22groupBy%22%3A%5B%5D%2C%22powerUserMode%22%3Afalse%2C%22type%22%3A1%2C%22id%22%3A3%2C%22displayType%22%3A%22line%22%2C%22focusedSeries%22%3A%5B%5D%2C%22sort%22%3A%7B%22order%22%3A%22asc%22%7D%2C%22isHidden%22%3Afalse%2C%22overlays%22%3A%5B%22samples%22%5D%7D%2C%7B%22mri%22%3A%22d%3Acustom%2Fstorage_service.train_drive.indexable_files%40count%22%2C%22op%22%3A%22sum%22%2C%22query%22%3A%22%22%2C%22groupBy%22%3A%5B%5D%2C%22powerUserMode%22%3Afalse%2C%22type%22%3A1%2C%22id%22%3A4%2C%22displayType%22%3A%22line%22%2C%22focusedSeries%22%3A%5B%5D%2C%22sort%22%3A%7B%22order%22%3A%22asc%22%7D%2C%22isHidden%22%3Afalse%2C%22overlays%22%3A%5B%22samples%22%5D%7D%2C%7B%22mri%22%3A%22d%3Acustom%2Fstorage_service.train_drive.transferred_files%40count%22%2C%22op%22%3A%22sum%22%2C%22query%22%3A%22%22%2C%22groupBy%22%3A%5B%5D%2C%22powerUserMode%22%3Afalse%2C%22type%22%3A1%2C%22id%22%3A5%2C%22displayType%22%3A%22line%22%2C%22focusedSeries%22%3A%5B%5D%2C%22sort%22%3A%7B%22order%22%3A%22asc%22%7D%2C%22isHidden%22%3Afalse%2C%22overlays%22%3A%5B%22samples%22%5D%7D%2C%7B%22mri%22%3A%22c%3Acustom%2Fgoogle_drive.transfer_file.failure.file_not_found%40none%22%2C%22op%22%3A%22sum%22%2C%22query%22%3A%22%22%2C%22groupBy%22%3A%5B%5D%2C%22powerUserMode%22%3Afalse%2C%22type%22%3A1%2C%22id%22%3A6%2C%22displayType%22%3A%22line%22%2C%22focusedSeries%22%3A%5B%5D%2C%22sort%22%3A%7B%22order%22%3A%22asc%22%7D%2C%22isHidden%22%3Afalse%2C%22overlays%22%3A%5B%22samples%22%5D%7D%2C%7B%22mri%22%3A%22c%3Acustom%2Fgoogle_drive.transfer_file.failure.export_size_limit_exceeded%40none%22%2C%22op%22%3A%22sum%22%2C%22query%22%3A%22%22%2C%22groupBy%22%3A%5B%5D%2C%22powerUserMode%22%3Afalse%2C%22type%22%3A1%2C%22id%22%3A7%2C%22displayType%22%3A%22line%22%2C%22focusedSeries%22%3A%5B%5D%2C%22sort%22%3A%7B%22order%22%3A%22asc%22%7D%2C%22isHidden%22%3Afalse%2C%22overlays%22%3A%5B%22samples%22%5D%7D%5D
DSN
https://8f29233cca683e7b0a93e6a2b859bb17@o4506853392973824.ingest.us.sentry.io/4507091072385024
Version
No response
┆Issue is synchronized with this Jira Improvement by Unito
Assigning to @getsentry/support for routing ⏲️
Routing to @getsentry/product-owners-ddm for triage ⏲️
Thank you for raising this, we're looking into it 🙏
@rafalmemyard Thanks for reaching out! The Java SDK does not sample metrics on the client side. Only if (sentry) server side rate limiting is in effect, metrics could get dropped. But as far as I can see that's not the case for you.
Regarding the log messages: We do not log individual metrics, as this could be overwhelming, but rather log a summary on every flush. If you emit n different metrics you should see a log similar to this:
DEBUG: Metrics: flushing <n> buckets
Maybe that's something you can verify?
Apart from that and depending on your setup you might need to call Sentry.close() which will flush any unsent metrics.
@rafalmemyard Thanks for reaching out! The Java SDK does not sample metrics on the client side. Only if (sentry) server side rate limiting is in effect, metrics could get dropped. But as far as I can see that's not the case for you.
OK, thanks for checking on that.
Regarding the log messages: We do not log individual metrics, as this could be overwhelming, but rather log a summary on every flush. If you emit
ndifferent metrics you should see a log similar to this:
DEBUG: Metrics: flushing <n> bucketsMaybe that's something you can verify?
I do see those messages, but the number of buckets is always 1, even though I expect I should see multiple metrics flushed in some cases where I log '....started', '....completed', '....error.
I also never see more than 1 of the flushing <n> buckets messages in a row, generally they look like this:
DEBUG: Metrics: nothing to flush
DEBUG: Metrics: nothing to flush
DEBUG: Metrics: flushing 1 buckets
DEBUG: Metrics: capturing metrics
DEBUG: Metrics: nothing to flush
DEBUG: Metrics: flushing 1 buckets
DEBUG: Metrics: capturing metrics
DEBUG: Metrics: nothing to flush
DEBUG: Metrics: flushing 1 buckets
DEBUG: Metrics: capturing metrics
DEBUG: Metrics: nothing to flush
DEBUG: Metrics: flushing 1 buckets
DEBUG: Metrics: capturing metrics
DEBUG: Metrics: nothing to flush
DEBUG: Metrics: flushing 1 buckets
DEBUG: Metrics: capturing metrics
DEBUG: Metrics: nothing to flush
DEBUG: Metrics: flushing 1 buckets
DEBUG: Metrics: capturing metrics
DEBUG: Metrics: nothing to flush
DEBUG: Metrics: flushing 1 buckets
Is there some significance to the DEBUG: capturing metrics message? Ie, it should have been capturing metrics the whole time.
Apart from that and depending on your setup you might need to call
Sentry.close()which will flush any unsent metrics.
I assume this shuts down the SDK totally, and since this is a long-running service that just won't do. It appears that I could maybe call Sentry.flush() at the end of some of the functions I call in a thread pool -- but at least some of the JavaDocs say the method is unimplemented in Java SDK 7.9.0 (though it looks like the internals call it and the definitions I found don't look like they throw any unsupported exceptions). Thoughts on this?
@rafalmemyard Thanks for the additional details, this sounds awfully similar to this issue which appeared in 7.9.0
Could you try the previous "working" Java SDK version 7.8.0?
@rafalmemyard Thanks for the additional details, this sounds awfully similar to this issue which appeared in
7.9.0Could you try the previous "working" Java SDK version
7.8.0?
Will definitely try downgrading to 7.8.0 and report -- I think I initially saw some issues with 7.8.0 and then tried to upgrade in hopes things would get better, but that ticket does hit very close to what I'm seeing.
@rafalmemyard Thanks for the additional details, this sounds awfully similar to this issue which appeared in
7.9.0Could you try the previous "working" Java SDK version7.8.0?Will definitely try downgrading to 7.8.0 and report -- I think I initially saw some issues with 7.8.0 and then tried to upgrade in hopes things would get better, but that ticket does hit very close to what I'm seeing.
I still think I'm seeing way fewer metrics with 7.8.0 than I should -- @markushi can you let me know when 7.11.0 (or whichever version has your fix from https://github.com/getsentry/sentry-java/pull/3493) ships?
@rafalmemyard we're aiming to get the fix out early this week. I'll keep you posted!
@rafalmemyard 7.11.0 is out, could you give it a try and let us know if this got fixed?
I've just done one test locally with 7.11.0 and it's already looking better than 7.8.0 or later versions. Will deploy to our staging apps to verify that things look better there as well and re-ping you all, but the initial results look good!
I also noticed this not just for metrics but for regular events. I am using python sdk in this case. See reported issue https://github.com/getsentry/relay/issues/3873
Thinking this is a relay issue, but haven't yet looked into the sdk itself. Will be looking into python sdk next .
@chipzzz thanks for chiming in and the detailed report. I believe you're experiencing a different issue which is tracked in https://github.com/getsentry/relay/issues/3873 and I'm gonna close this one since the fix resolved the problems reported here. In case something related to this comes up again, feel free to comment here.