fleet Fix flaky `TestUpdateStatsOnReplica`

The replica DB test frequently fails

Last_SQL_Errno:1146 Last_SQL_Error:Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at master log bin.000003, end_log_pos 53420339. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.

It appears the replica DB may be not set up correctly.

Ideas:

During set up, make a dummy write so that we can be sure replica is up and running before the test.
On fail, dump contents of performance_schema.replication_applier_status_by_worker (or the MySQL 5.7 equivalent)

Jun 27 '24 12:06 getvictor

Hey team! Please add your planning poker estimate with Zenhub @getvictor @jacobshandling @lucasmrod @mostlikelee @RachelElysia

Jun 28 '24 15:06 sharon-fdm

@getvictor I think I found a fix for this in https://github.com/fleetdm/fleet/pull/20121. Running integration tests in parallel (https://github.com/fleetdm/fleet/pull/20085) makes this consistently fail on MySQL 8, that's why I took a look.

I think the fix works because I pushed a branch with both the parallel runs + the fix and the test passes there.

Could you take a look at the PR?

Jul 01 '24 12:07 roperzh

Set up the replica, Peaceful as morning's first light. Tests run, flawless flight.

Jul 17 '24 23:07 fleet-release