citus icon indicating copy to clipboard operation
citus copied to clipboard

Hopefully reduce flaky tests by disabling the maintenance daemon

Open JelteF opened this issue 3 years ago • 1 comments

Sometimes our CI randomly fails on a test in a way similar to this:

 step s2-drop:
     DROP TABLE cancel_table;
-
+ <waiting ...>
+step s2-drop: <... completed>

 starting permutation: s1-timeout s1-begin s1-sleep10000 s1-rollback s1-reset s1-drop

Source: https://app.circleci.com/pipelines/github/citusdata/citus/26524/workflows/5415b84f-13a3-482f-bef9-648314c79a67/jobs/756377

Another example of a failure like this:

 stop_session_level_connection_to_node
 -------------------------------------
                                      
 (1 row)
 
 step s3-display: 
  SELECT * FROM ref_table ORDER BY id, value;
  SELECT * FROM dist_table ORDER BY id, value;
-
+ <waiting ...>
+step s3-display: <... completed>
 id|value
 --+-----

Source: https://app.circleci.com/pipelines/github/citusdata/citus/26551/workflows/91dca4b2-bb1c-4cae-b2ef-ce3f9c689ce5/jobs/757781

A step that shouldn't be blocked is detected as "waiting..." temporarily and then gets unblocked automatically immediately after. I'm not certain of the reason for this, but one explanation is that the maintenance daemon is doing something that blocks the query. In the shown case my hunch is that it could be the deferred shard deletion.

This PR disables all the features of the maintenance daemon during isolation testing to try and prevent process from randomly being detected as blocking.

NOTE: I'm not certain that this will actually fix this issue. If the issue persists even after this change, at least we know that it's not the maintenance daemon that's blocking it.

JelteF avatar Aug 26 '22 11:08 JelteF

Another example of a similar issue:

 stop_session_level_connection_to_node
 -------------------------------------
                                      
 (1 row)
 
 step s3-display: 
  SELECT * FROM ref_table ORDER BY id, value;
  SELECT * FROM dist_table ORDER BY id, value;
-
+ <waiting ...>
+step s3-display: <... completed>
 id|value
 --+-----

Source: https://app.circleci.com/pipelines/github/citusdata/citus/26551/workflows/91dca4b2-bb1c-4cae-b2ef-ce3f9c689ce5/jobs/757781

JelteF avatar Aug 26 '22 14:08 JelteF