fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Fix flaky `TestUpdateStatsOnReplica`

Open getvictor opened this issue 1 year ago • 2 comments

The replica DB test frequently fails

Last_SQL_Errno:1146 Last_SQL_Error:Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at master log bin.000003, end_log_pos 53420339. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.

It appears the replica DB may be not set up correctly.

Ideas:

  • During set up, make a dummy write so that we can be sure replica is up and running before the test.
  • On fail, dump contents of performance_schema.replication_applier_status_by_worker (or the MySQL 5.7 equivalent)

getvictor avatar Jun 27 '24 12:06 getvictor

Hey team! Please add your planning poker estimate with Zenhub @getvictor @jacobshandling @lucasmrod @mostlikelee @RachelElysia

sharon-fdm avatar Jun 28 '24 15:06 sharon-fdm

@getvictor I think I found a fix for this in https://github.com/fleetdm/fleet/pull/20121. Running integration tests in parallel (https://github.com/fleetdm/fleet/pull/20085) makes this consistently fail on MySQL 8, that's why I took a look.

I think the fix works because I pushed a branch with both the parallel runs + the fix and the test passes there.

Could you take a look at the PR?

roperzh avatar Jul 01 '24 12:07 roperzh

Set up the replica, Peaceful as morning's first light. Tests run, flawless flight.

fleet-release avatar Jul 17 '24 23:07 fleet-release