tests/integration: data store mgr tests unstable
The data store mgr tests or a subset thereof sometimes fail and sometimes pass.
Some example traceback:
def test_update_data_structure(harness):
"""Test update_data_structure. This method will generate and
apply adeltas/updates given."""
schd, data = harness
w_id = schd.data_store_mgr.workflow_id
schd.data_store_mgr.data[w_id] = data
assert TASK_STATUS_FAILED not in set(collect_states(data, TASK_PROXIES))
assert TASK_STATUS_FAILED not in set(collect_states(data, FAMILY_PROXIES))
assert TASK_STATUS_FAILED not in data[WORKFLOW].state_totals
> assert len({t.is_held for t in data[TASK_PROXIES].values()}) == 2
E assert 0 == 2
E +0
E -2
The test file:
tests/integration/test_data_store_mgr.py
Have run it twenty times locally and no failure... I will see if there's a more robust way to test the update.
I'd have to understand how the test harness works at a deeper level to know why it's inconsistent..
i.e. why does this:
@pytest.mark.asyncio
@pytest.fixture(scope='module')
async def harness(mod_flow, mod_scheduler, mod_run):
flow_def = {
'scheduler': {
'allow implicit tasks': True
},
'scheduling': {
'graph': {
'R1': 'foo => bar'
}
}
}
reg: str = mod_flow(flow_def)
schd: 'Scheduler' = mod_scheduler(reg)
async with mod_run(schd):
schd.pool.hold_tasks('*')
schd.resume_workflow()
# Think this is needed to save the data state at first start (?)
# Fails without it.. and a test needs to overwrite schd data with this.
data = schd.data_store_mgr.data[schd.data_store_mgr.workflow_id]
yield schd, data
Sometimes yield the scheduler with no task proxies? Did the tasks not exist when the hold command was issued?
Hard to tell unless I can reproduce the issue somehow.. (the test in question still worked when I commented out the hold)
I think the same harness yield gets shared amongst the tests.. Whenever I changes the order of the tests, it appears to effect what data the next tests are working with.
So perhaps create another harness object (if possible), and see if it occurs again ... Again, easier if I could reproduce..
On a side note, this also works:
(flow) sutherlander@cortex-vbox:cylc-flow$ git diff
diff --git a/tests/integration/test_data_store_mgr.py b/tests/integration/test_data_store_mgr.py
index 2d6d584cc..533c30a60 100644
--- a/tests/integration/test_data_store_mgr.py
+++ b/tests/integration/test_data_store_mgr.py
@@ -259,8 +259,8 @@ def test_update_data_structure(harness):
schd.data_store_mgr.data[w_id] = data
assert TASK_STATUS_FAILED not in set(collect_states(data, TASK_PROXIES))
assert TASK_STATUS_FAILED not in set(collect_states(data, FAMILY_PROXIES))
- assert TASK_STATUS_FAILED not in data[WORKFLOW].state_totals
- assert len({t.is_held for t in data[TASK_PROXIES].values()}) == 2
+ assert data[WORKFLOW].state_totals[TASK_STATUS_FAILED] == 0
+ assert sum(data[WORKFLOW].state_totals.values()) == 2
for itask in schd.pool.get_all_tasks():
itask.state.reset(TASK_STATUS_FAILED)
schd.data_store_mgr.delta_task_state(itask)
So I don't think the key lookup is working..
I did this because the manager changed to reset the state totals for families being updated:
# Use all states to clean up pruned counts
for state in TASK_STATUSES_ORDERED:
fp_delta.state_totals[state] = state_counter.get(state, 0)
fp_updated.setdefault(fp_id, PbFamilyProxy()).MergeFrom(fp_delta)
Sometimes yield the scheduler with no task proxies?
Ah, perhaps we have to explicitly get the pool to release tasks, otherwise there is a race condition between the async with and the async def main_loop?
I assume this is still an issue?
I still run into this one, it seems to happen more often with a higher level of parallelism in the tests. Still have no idea why it's happening.
Was playing with a small change to the integration tests which caused these tests to break badly so took a look in.
I think the problem is:
- The main loop is running whilst the tests are running which means the scheduler can do things in-between tests, or not, depending on how asyncio manages tasks which means the data store may be updated, or not between tests. This would explain why I see these failures a lot, whereas we don't hit them much on GHA (because I parallelise the hell out of the tests!).
- The tests all share the same workflow, but the tests also modify the workflow and don't undo their changes which means the tests can interact.
I've addressed (1) on my branch by adding a new fixture that lets us start scheduler without running the main loop to give these tests a fighting chance.
I've fixed the resulting failures bar one, which I don't understand in test_update_data_structure:
# Shows pruning worked
assert len({t.is_held for t in data[TASK_PROXIES].values()}) == 1
The test resets a task state to failed, then expects the number of is_held tasks to change as a result? Commented out for now.
Done (1) in #4620, may have helped, however, I still get flaky failures ~50% of the time for:
- test_delta_task_held
- test_update_data_structure
With high parallelism.
Well, not sure what to do about (2) other than have tests operate on different workflows, as it's not something that would happen in the wild..
Also I'm not sure the tests could "undo" their changes, a) wouldn't parallelism still cause issues while undoing? b) the cheapest way to undo would be to reload/reinitialize the data-store, assuming other tests aren't running at the time.
Is this still a problem? Seems fairly stable now