RAWeb icon indicating copy to clipboard operation
RAWeb copied to clipboard

Fixes for Achievement Distribution selections

Open TwosomesUP opened this issue 1 year ago • 18 comments

Initial commit and composer fix Update achievement distribution selections and migration to Eloquent Update calculateBuckets and handleAllAchievementsCase to accommodate selection update Update V1Test to accommodate selection update

Resolves: #2399

TwosomesUP avatar May 24 '24 17:05 TwosomesUP

There are a lot of changes in this branch that seem unrelated to the achievement distribution issue. I'm not sure what happened - perhaps composer fix was overzealous.

We should thin out this branch so only your specific changes are included, though.

wescopeland avatar May 24 '24 17:05 wescopeland

Yep, composer fix definitely seemed to have went above and beyond. I can see if I can pull out those changes.

TwosomesUP avatar May 24 '24 17:05 TwosomesUP

@wescopeland additional changes added by composer fix have been removed.

TwosomesUP avatar May 24 '24 18:05 TwosomesUP

Doing some local testing, the new implementation appears to be an order of magnitude slower than what's currently in master. I haven't dived into any possible reason why this may be occurring - I'm just looking purely at query execution time.

These queries are taken from a game that has 175,000 player_achievements records associated to it.

master Screenshot 2024-05-24 at 2 14 58 PM

This branch Screenshot 2024-05-24 at 2 14 44 PM

wescopeland avatar May 24 '24 18:05 wescopeland

Got it. I can investigate the SQL as well to see if there's any way to optimize it. What's curious is that this query pulls from player_games, which seems to have 1 row per player per game, instead of player_achievements which is 1 row per achievement per player.

There wouldn't be a SQL statement I could run locally that would insert 200k rows into player_achievements would there? :)

TwosomesUP avatar May 24 '24 18:05 TwosomesUP

Ah, truthfully I didn't even notice that these queries aren't reading from player_achievements, I just made a (poor) assumption 😅

wescopeland avatar May 24 '24 18:05 wescopeland

I made some minor optimizations to the queries, and they seem to be executing faster, but I'll need to set up some volume testing on my end to be sure.

image

TwosomesUP avatar May 24 '24 20:05 TwosomesUP

Thanks for taking a look. Just checked on my end with latest from the branch, seems about the same 😟

Screenshot 2024-05-24 at 4 53 35 PM

wescopeland avatar May 24 '24 20:05 wescopeland

Sorry, same problem on latest 😔

Branch Screenshot 2024-05-24 at 8 24 22 PM

master Screenshot 2024-05-24 at 8 24 45 PM

wescopeland avatar May 25 '24 00:05 wescopeland

No worries! I'm going to break this off into another branch to play around with. Hopefully I can figure something out that's more performant. 😁

TwosomesUP avatar May 25 '24 02:05 TwosomesUP

Ok, I was able to insert about 25k rows into player_games for game_id 1 with a variety of unlock combinations, and I'm definitely seeing the performance hit now!

image

image

This should allow me to run better unit tests with this selection. 😄

TwosomesUP avatar May 25 '24 15:05 TwosomesUP

Hey @wescopeland !

I did some extensive testing/tinkering with the Achievement Distribution query and as the data is currently structured, I'm not sure we'd be able to get the selection much faster than it currently is. The fastest I could get it was 50ms for 25k rows on the player_games table.

That said, there is a different approach we could take to speed this up, but it would require altering the player_games table and adding a new stored Generated Column for softcore unlocks, and an index. This has a much bigger impact, so I'll leave it up to RA if you all would like to consider this route, but here are my results from local testing:

Legacy Legacy

PR Aggregate (Eloquent) Aggregate

Alter + Index (Eloquent) WithAlterAndIndex_Eloquent

Alter + Index (DB::select) WithAlterAndIndex_DBSelect

I'm not sure if there is a configuration difference with my instance of MySQL but the Alter + Index seems to have equivalent performance compared to the legacy query. For your reference, here is the SQL I used for the alter and the index build:

alter table player_games
add column achievements_unlocked_softcore int unsigned generated always as (greatest(achievements_unlocked - achievements_unlocked_hardcore, 0)) stored;

alter table player_games add index player_games_game_id_achievements_unlocked_softcore_index (game_id, achievements_unlocked_softcore)

Again, this can have a big impact since it alters a table with potentially a million+ rows in production, and it would calculate the softcore unlocks for each one of those (one time). Once altered, the column would be calculated for a row upon insert or recalculated upon update.

With that in mind, it's completely understandable if this is something RA would not want to pursue at the moment.

TwosomesUP avatar May 26 '24 13:05 TwosomesUP

I had a feeling needing a new denormalized column would be how things might shake out 😆

That said, I would be fine with pursuing this. Before moving forward, I think another maintainer should also sign off on the idea though (cc @RetroAchievements/web-maintainers). I can think back to other times where I've needed to perform a calculation to peek into a user's mixed progress. Just having the raw value would've been more helpful and performant.

Going down this road is a more significant lift and is much more fraught with peril. It would need to occur in two phases (releases): a write phase and a read phase.

For the write phase, we would need a PR/changeset which includes:

  • The migration itself, in database/migrations/platform.
  • Changes made to the UpdatePlayerGameMetrics action so these new values get populated.
  • Updated tests, etc.

I personally haven't triggered a full resync of player game metrics on the server, but I know we've done it once or twice in the past when we first began experimenting with the performance benefits of denormalized data. Once the code is deployed, we'd trigger a full resync, which would queue up probably about 10,000,000 jobs. This would take quite some time to finish.

For the read phase, it'd essentially be a PR containing what you have now, but instead reading from these new denormalized values.

wescopeland avatar May 26 '24 14:05 wescopeland

Great! I can start researching the write phase steps this week. I may reach out for guidance on a few things if I hit a wall, but I'll be sure to let you know.

TwosomesUP avatar May 26 '24 14:05 TwosomesUP

@wescopeland a PR for the Write Phase (#2477) has been submitted. If/when the new PR is approved and merged, I can update this PR to accommodate the changes. Just let me know if there's anything I may have missed, or if you have any questions/concerns.

Thanks!

TwosomesUP avatar May 30 '24 14:05 TwosomesUP

@wescopeland circling back to this and after some testing, it seems like the automatic "Soft Delete" check when selecting from the player_games table using Eloquent and the PlayerGame model is impacting performance. Specifically when the deleted_at column is included in the selection. Here are the timings for selections with and without the soft delete check:

Eloquent with Soft Delete check:

$countQuery = PlayerGame::selectRaw("$countColumn as AwardedCount, count(*) as NumUniquePlayers")
            ->where("$countColumn", ">", 0)
            ->where('player_games.game_id', $gameID)
            ->groupBy('AwardedCount')
            ->orderByDesc('AwardedCount');

image

Eloquent without Soft Delete check (code same as above):

image *This one removes the use SoftDeletes from the PlayerGame model. Probably don't want to do this. :)

DB::table() select without Soft Delete check:

$countQuery = DB::table('player_games')
            ->selectRaw("$countColumn as AwardedCount, count(*) as NumUniquePlayers")
            ->where("$countColumn", ">", 0)
            ->where('player_games.game_id', $gameID)
            ->groupBy('AwardedCount')
            ->orderByDesc('AwardedCount');

image *This one bypasses the PlayerGame model and does a direct DB select

TwosomesUP avatar Jul 20 '24 20:07 TwosomesUP

@TwosomesUP Sorry for my late reply here. I'm in the process of setting up some production data to test with, which should be done later today. As soon as that is set up, getting you a review will be my highest priority 👍

wescopeland avatar Aug 10 '24 13:08 wescopeland

I appreciate it Wes, but no rush! I still need to update this pull request with the new selects that account for the new achievements_unlocked_softcore column.

I've been on vacation, so I haven't had a chance to get to it yet, but I'll try to get that pushed either today or tomorrow.

TwosomesUP avatar Aug 10 '24 13:08 TwosomesUP

Hi @TwosomesUP! Looks like we may have a small merge conflict. Then, assuming @Jamiras approves, we can merge to get this in the next release.

wescopeland avatar Sep 19 '24 21:09 wescopeland