Crash Free Users Widget on Projects view has incorrect value
Environment
SaaS (https://sentry.io/)
Steps to Reproduce
This was reported by a user but I see the behavior in our Sentry org as well.
- Go to a project's page that has session data.
- Note the Crash Free Users widget percentage value.
- Compare with the Crash Free Users graph below.
- See that the widget percentage value is not possible, and is below the lowest value on the Y axis on the graph.
Here's an example from our FE project:
Expected Result
Value should be accurate
Actual Result
Value is not accurate, and also not in the range of the graph. Maybe the totals calculation in the api call is off?
Product Area
Projects - Project Details
┆Issue is synchronized with this Jira Improvement by Unito
Assigning to @getsentry/support for routing ⏲️
Routing to @getsentry/product-owners-projects-project-details for triage ⏲️
That sure does seem fishy. Lemme do a little investigation and see what I can find.
Okay, so it appears the problem is in the backend, in that the data is already wrong when it comes back in the response to the API request:
(It's hitting this endpoint.)
Another thing that's slightly mysterious: There are two requests to the endpoint, one of which comes back with data per day (as in the screenshot above - that's why there are so few entries in the series) and one with data per hour (as in the screenshot below). So far so good. But the overall crash-free rates returned by the two requests don't match, which... shouldn't they? (It's the number from the daily data response which is shown in the widget, though I'm not sure if that's significant.)
So something is definitely off in our calculations. @getsentry/sns, would this fall under your purview?
So something is definitely off in our calculations. @getsentry/sns, would this fall under your purview?
No, this is maybe @getsentry/telemetry-experience.
Commenting as requested by support to inform that we are experiencing the same issue, and it appears in both the "project" view and the "releases" views.
( If more info is required you can contact me directly )
As a temporary patch would it make sense to modify the code in projectStabilityScoreCard.tsx to calculate the average from the series locally in the client, as that data seems to be correct?
And once the backend is fixed then change it back?
@getsentry/telemetry-experience does this fall under your domain?
Hi, I see why this seems confusing, but I think that data might actually be correct.
Individual hours have crash-free user rate at the lowest 92%. So how come 14d crash-free users number is 82%? It's because we don't average hours over the 14d period and show a big number from that. Crash free users is a set. That means that in hour 1 a different set of users experienced a crash than in hour two (even though both are at 92%). The 14d number looks at the entire period and give you the percentage of users than never experienced a crash in that 14 period (not an hour or average of hours).
Thanks for the feedback, I 100% agree that averaging the hours or days themselves would not be good, as that would not take into account the number of users that were in that time period.
That being said, for me to understand, which case would allow a set to be lower than any of its points, for instance here if we have 1101300 users and 12052 of them crash, but almost all of them happened the first hour/day, if we simply average we would get incorrect data indeed, it needs at least to be a weighed average based on number of users, but I don't see how the final "Total Crash Free Percentage" can be lower than any of the data points:
As I say, sorry, as I don't fully understand this so that is why I'm asking.
@matejminar is it possible to get an update on the previous comment from @dsolercigames?
If I understand correctly, the case where this would happen would be not like I pointed out ( when a user exist one day but not the other ), but with returning users over multiple timeframes.
That being said what I think confused me is the crash-free users being displayed without decimals most of the time. Even if it says +0.178% increase or -0.387% decrease, it shows as 94% for instance, instead of appearing with three decimals like the crash-free session rate.
So no idea if there is an issue there? ( And thanks for the patience and taking time to respond )
Hi! Looks like UI does not show decimal points when the crash free rate is below 95%. https://github.com/getsentry/sentry/blob/a909e82c1e0808eddadf72069575cf4b6daf2062/static/app/views/releases/utils/index.tsx#L19 I think the original intention behind that was that no one cares if their app is 62.345% stable; those decimal points usually make a difference only in the high 90s. 👍
Hi, in our case we had the bad luck of having the combined issue of AMD drivers being faulty + the dreaded intel 13900k + 14900k being faulty, causing our app to crash a lot.
Being able to see the progress towards enabling workarounds for faulty hardware would have been very helpful for us, is it possible to at least lower that to the 90% mark?
( And thanks for taking all this time to answer )
No worries! Yeah, we can do that 👍