osu icon indicating copy to clipboard operation
osu copied to clipboard

Add UR estimation to the osu!mania ruleset

Open Natelytle opened this issue 2 years ago • 13 comments

Massive thanks to Frost for the majority of the math behind this rework, and to Evening, Molneya, and Komirin for guidance.

Estimates Unstable Rate and replaces Accuracy with it

Changes

  • Raw accuracy percentages are no longer used to scale performance
  • Instead, the UR value of the score is estimated from the judgements of the play and the sizes of the hit windows, and used to scale performance in an hit window agnostic way

Reasoning

  • Between Scorev1, Scorev2, and osu!lazer Score, the way hit windows work differ massively. Because of this, the same play will receive a different accuracy value depending on which system is used.
  • Performance previously did not scale with OD.
  • Unstable rate is an easier metric to work with, as there is a true "perfect" value.

Estimation Theory

In order to estimate UR, we assume all hits are normally distributed, with a mean of ±0 and a deviation σ. This gives us the probability that with a certain σ, any given hit lands in a given window. We can compare these percentages to the true percentages of any given judgement in a play, and return whichever σ value is the closest match. image Further documentation can be found in a google doc here.

Considerations

  • Mania hit windows may potentially change as mania in lazer is ironed out. This will require addressing and an update to the current tests.
  • Classic mod doesn't change the results of a replay, forcing me to use a workaround that causes note-only classic mod plays to give incorrect values when replayed.
  • Requires MathNet.Numerics, a package for advanced mathematical formulas not present in C#.

SR/PP spreadsheet: https://docs.google.com/spreadsheets/d/1SBFrFOWIxJM-gACZrOk-2I6mH35MN9FRUDXbrTcGnJs/edit As of a2197e2

Natelytle avatar Feb 12 '23 04:02 Natelytle

sorry, I'm not sure what happened with those commits

Natelytle avatar Feb 21 '23 23:02 Natelytle

I can't see a clear way to test for the multiplier on lazer LN tails, or on whether or not rate adjustments affect the mania hit windows, but I implemented some simple tests to see if changes affect the UR estimation or if the hit window formulas are changed. I'm considering adding a test to compare the estimated UR of a play to the real play, but that may require more work than is worth it. Any and all suggestions are appreciated.

Natelytle avatar Apr 16 '23 03:04 Natelytle

Decided to go back to adding the new curve in this PR after a discussion with Evening, @smoogipoo if you find some time could you make a smoogisheet for this? Be sure to append classic to the scores before feeding them into the sheet or set isLegacyScore to always be true, as this rework currently returns lazer values for non classic mod scores.

Natelytle avatar Apr 17 '23 02:04 Natelytle

Is it possible to adjust the test .osu file to something that can be understood? ie. it shouldn't be a full beatmap with 8,000 objects, but a limited number of objects where you can parse the file visually and understand the results obtained.

peppy avatar Jul 30 '23 07:07 peppy

Tests should be more manageable now.

Natelytle avatar Aug 14 '23 02:08 Natelytle

I've run an SR/PP spreadsheet for these changes, and attached it to the OP. Please have a look over it to make sure that the values are expected :)

smoogipoo avatar Aug 14 '23 08:08 smoogipoo

Sheet looks fine, I cross checked a few values and theyre all fine

Natelytle avatar Aug 16 '23 20:08 Natelytle

Will require reapproval due to https://github.com/ppy/osu/pull/24636

Natelytle avatar Sep 04 '23 11:09 Natelytle

Just to be safe...

!diffcalc RULESET=mania

smoogipoo avatar May 28 '24 11:05 smoogipoo

Target: https://github.com/ppy/osu/pull/22613 Spreadsheet: https://docs.google.com/spreadsheets/d/1qZxiHg_RhDf0w9id2DW2YqiE4JCxNRhuUKrOtIIpjzM/edit

github-actions[bot] avatar May 28 '24 11:05 github-actions[bot]

Deployment considerations:

  • 3 added difficulty attributes
    • 3 rows per 13 B
    • we have ~19400 mania-specific beatmaps, for which only 16 mod combinations need to be considered = ~12 MB of storage overhead
    • plus ~111700 converts, which have 16 * 14 available mod combinations (keymods) = ~976 MB of storage overhead

Seemingly nothing further to point out from the infra side.

bdach avatar May 29 '24 06:05 bdach

@bdach Before you start any code structure reviews, what are your passing thoughts on the structure of https://github.com/Natelytle/osu/blob/maniastataccrefactor (specifically Utils/LogProb.cs)? I find it's much better for parsing the mathematics since the log stuff is abstracted away, but I'd like a second opinion.

Natelytle avatar May 30 '24 13:05 Natelytle

what are your passing thoughts on the structure of https://github.com/Natelytle/osu/blob/maniastataccrefactor (specifically Utils/LogProb.cs)

I'm not sure I'm going to be doing math/methodology correctness reviews on diffcalc anymore myself as I'm not sure that it's productive or even needed at this point. I was more angling to do just general code quality stuff. That said, since you asked directly, I'm not sure what to think of that thing. Some of the operations there have me worried about correctness of function domain and/or numerical stability - specifically I'm talking about stuff like presenting

$$ \log (x+y) = \log \left( x \cdot \left( 1 + \frac{y}{x} \right) \right) = \log x + \log \left( 1 + \frac{y}{x} \right) $$

etc (assuming $y > x$). It seems to be symbolically correct but I'm not sure how it's going to behave in the wild as I'm rusty on my analysis. Some other operations in there just seem plain arbitrary (what is LogProb.Combine()? does it represent some real operation / have any semantic meaning rather than be an arbitrary expression on logs that happens to be useful in this case?)

Again as per my opening post I'd probably not ding this in review because I'm not sure I care about the meat and potatoes of diffcalc changes at this point, but yeah. If these operations are meant to have some probabilistical interpretation then there should be xmldoc that explains what it is, otherwise it just doesn't seem like a very robust abstraction.

Oh and if that thing is gonna stay then please change the name to something resembling full words, LogProbability is fine.

bdach avatar May 31 '24 05:05 bdach

Closing due to complexity and the lack of a real use for this in the system.

Natelytle avatar Jan 14 '25 05:01 Natelytle