Add UR estimation to the osu!mania ruleset
Massive thanks to Frost for the majority of the math behind this rework, and to Evening, Molneya, and Komirin for guidance.
Estimates Unstable Rate and replaces Accuracy with it
Changes
- Raw accuracy percentages are no longer used to scale performance
- Instead, the UR value of the score is estimated from the judgements of the play and the sizes of the hit windows, and used to scale performance in an hit window agnostic way
Reasoning
- Between Scorev1, Scorev2, and osu!lazer Score, the way hit windows work differ massively. Because of this, the same play will receive a different accuracy value depending on which system is used.
- Performance previously did not scale with OD.
- Unstable rate is an easier metric to work with, as there is a true "perfect" value.
Estimation Theory
In order to estimate UR, we assume all hits are normally distributed, with a mean of ±0 and a deviation σ. This gives us the probability that with a certain σ, any given hit lands in a given window. We can compare these percentages to the true percentages of any given judgement in a play, and return whichever σ value is the closest match.
Further documentation can be found in a google doc here.
Considerations
- Mania hit windows may potentially change as mania in lazer is ironed out. This will require addressing and an update to the current tests.
- Classic mod doesn't change the results of a replay, forcing me to use a workaround that causes note-only classic mod plays to give incorrect values when replayed.
- Requires MathNet.Numerics, a package for advanced mathematical formulas not present in C#.
SR/PP spreadsheet: https://docs.google.com/spreadsheets/d/1SBFrFOWIxJM-gACZrOk-2I6mH35MN9FRUDXbrTcGnJs/edit As of a2197e2
sorry, I'm not sure what happened with those commits
I can't see a clear way to test for the multiplier on lazer LN tails, or on whether or not rate adjustments affect the mania hit windows, but I implemented some simple tests to see if changes affect the UR estimation or if the hit window formulas are changed. I'm considering adding a test to compare the estimated UR of a play to the real play, but that may require more work than is worth it. Any and all suggestions are appreciated.
Decided to go back to adding the new curve in this PR after a discussion with Evening, @smoogipoo if you find some time could you make a smoogisheet for this? Be sure to append classic to the scores before feeding them into the sheet or set isLegacyScore to always be true, as this rework currently returns lazer values for non classic mod scores.
Is it possible to adjust the test .osu file to something that can be understood? ie. it shouldn't be a full beatmap with 8,000 objects, but a limited number of objects where you can parse the file visually and understand the results obtained.
Tests should be more manageable now.
I've run an SR/PP spreadsheet for these changes, and attached it to the OP. Please have a look over it to make sure that the values are expected :)
Sheet looks fine, I cross checked a few values and theyre all fine
Will require reapproval due to https://github.com/ppy/osu/pull/24636
Just to be safe...
!diffcalc RULESET=mania
Target: https://github.com/ppy/osu/pull/22613 Spreadsheet: https://docs.google.com/spreadsheets/d/1qZxiHg_RhDf0w9id2DW2YqiE4JCxNRhuUKrOtIIpjzM/edit
Deployment considerations:
- 3 added difficulty attributes
- 3 rows per 13 B
- we have ~19400 mania-specific beatmaps, for which only 16 mod combinations need to be considered = ~12 MB of storage overhead
- plus ~111700 converts, which have 16 * 14 available mod combinations (keymods) = ~976 MB of storage overhead
Seemingly nothing further to point out from the infra side.
@bdach Before you start any code structure reviews, what are your passing thoughts on the structure of https://github.com/Natelytle/osu/blob/maniastataccrefactor (specifically Utils/LogProb.cs)? I find it's much better for parsing the mathematics since the log stuff is abstracted away, but I'd like a second opinion.
what are your passing thoughts on the structure of https://github.com/Natelytle/osu/blob/maniastataccrefactor (specifically Utils/LogProb.cs)
I'm not sure I'm going to be doing math/methodology correctness reviews on diffcalc anymore myself as I'm not sure that it's productive or even needed at this point. I was more angling to do just general code quality stuff. That said, since you asked directly, I'm not sure what to think of that thing. Some of the operations there have me worried about correctness of function domain and/or numerical stability - specifically I'm talking about stuff like presenting
$$ \log (x+y) = \log \left( x \cdot \left( 1 + \frac{y}{x} \right) \right) = \log x + \log \left( 1 + \frac{y}{x} \right) $$
etc (assuming $y > x$). It seems to be symbolically correct but I'm not sure how it's going to behave in the wild as I'm rusty on my analysis. Some other operations in there just seem plain arbitrary (what is LogProb.Combine()? does it represent some real operation / have any semantic meaning rather than be an arbitrary expression on logs that happens to be useful in this case?)
Again as per my opening post I'd probably not ding this in review because I'm not sure I care about the meat and potatoes of diffcalc changes at this point, but yeah. If these operations are meant to have some probabilistical interpretation then there should be xmldoc that explains what it is, otherwise it just doesn't seem like a very robust abstraction.
Oh and if that thing is gonna stay then please change the name to something resembling full words, LogProbability is fine.
Closing due to complexity and the lack of a real use for this in the system.