aquadoggo
aquadoggo copied to clipboard
Solutions for race conditions in tests (and possibly outside of them too)
Turns out there were at least two places causing tests to sometimes fail, all quite interesting in the end. Although this PR was mainly about getting the tests working consistently, I actually ended up adding a little feature ;-)
Cross test pollution of mutably shared schema provider
-
static SCHEMA_PROVIDER: Lazy<Mutex<Vec<Schema>>> = Lazy::new(|| Mutex::new(Vec::new()));
We're using the above to allow the GraphQL schema builder access to the current schemas, in particular in
create_type_info
. This works, but when running multiple tests concurrently I thinkSCHEMA_PROVIDER
is being over-written to in one process before it is read in another.We take a
MutexGuard
insave_static_schemas()
here but it is dropped as it goes out of scope and this leaves an opening where another lock can be taken out before it's read in the same test. -
What we want to do then is keep the lock until we read again, we can return it, keeping it alive longer, but if we keep it too long we simply block our own read, and we can't pass it on because we don't have access to the
async_graphql
methods. -
So then maybe we need a yielding mutex like
tokio::sync::Mutex
which wakes a future when a lock is released? But we can't cos it'sasync
andcreate_type_info
isn't.
SOLUTION: We can force the problematic tests to run in series: https://docs.rs/serial_test/0.4.0/serial_test/ :star:
Waiting for stuff to happen
- it takes a little while for documents to be materialised into the db
- it takes a little while more for new schema to be registered in the GraphQL server
- in both these cases we were adding a small sleep in order to wait for the stuff to happen, but this isn't an ideal solution as it slows down the tests, and "how long is long enough"
SOLUTION: a new state messaging channel for all services :star:
closes: #247
📋 Checklist
- [x] Add tests that cover your changes
- [x] Add this PR to the Unreleased section in
CHANGELOG.md
- [x] Link this PR to any issues it closes
- [x] New files contain a SPDX license header
Codecov Report
Base: 93.95% // Head: 94.17% // Increases project coverage by +0.21%
:tada:
Coverage data is based on head (
6f069f5
) compared to base (670281b
). Patch coverage: 97.91% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## main #266 +/- ##
==========================================
+ Coverage 93.95% 94.17% +0.21%
==========================================
Files 59 59
Lines 4928 5027 +99
==========================================
+ Hits 4630 4734 +104
+ Misses 298 293 -5
Impacted Files | Coverage Δ | |
---|---|---|
aquadoggo/src/node.rs | 85.18% <66.66%> (+0.56%) |
:arrow_up: |
aquadoggo/src/materializer/service.rs | 95.41% <96.22%> (+0.52%) |
:arrow_up: |
aquadoggo/src/graphql/client/dynamic_query.rs | 91.01% <100.00%> (+0.14%) |
:arrow_up: |
...quadoggo/src/graphql/client/dynamic_types/tests.rs | 100.00% <100.00%> (ø) |
|
aquadoggo/src/graphql/client/mutation.rs | 100.00% <100.00%> (ø) |
|
aquadoggo/src/graphql/client/static_query.rs | 100.00% <100.00%> (ø) |
|
aquadoggo/src/graphql/client/tests.rs | 100.00% <100.00%> (ø) |
|
aquadoggo/src/graphql/schema.rs | 92.95% <100.00%> (+0.20%) |
:arrow_up: |
aquadoggo/src/http/service.rs | 95.34% <100.00%> (+0.47%) |
:arrow_up: |
aquadoggo/src/manager.rs | 96.00% <100.00%> (+0.16%) |
:arrow_up: |
... and 7 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
@sandreae, after some experimentation I thought it might be a good idea to keep this PR open for now and rather only merge the parts which fix the race conditions. You can find it here: https://github.com/p2panda/aquadoggo/pull/269 - What do you think? Is this a good idea?
I think it will make us get to what we want faster: Fix the annoying race conditions. The newly introduced feature very helpful and definitely more elegant, but is maybe not "solved" yet as we're both not happy with its API design. It doesn't give us much functionality outside of tests but already affects the API for other non-tests parts of the stack, making them harder to grasp. I'd rather merge the important bit for now and keep this open until we find something pretty!
Leaving your feature changes unmerged still keeps the second class of race conditions unfixed, but we can at least "control" them with bumping the waiting times. Also not beautiful, but it would come with no big changes for now.
We can keep this here open and make a new ticket which deals with introducing that feature? If you're happy with this I open #269 for review and create the ticket.
In any case, thanks so much for digging into all of that! What a ride!
In this and the new branch I sadly had still the following failing test (as I've mentioned on the chat):
thread 'db::stores::entry::tests::paginated_log_entries::case_6' panicked at 'Send to store: Custom("Entry's claimed seq num of 1 does not match expected seq num of 23 for given public key and log")', /home/adz/.cargo/registry/src/github.com-1ecc6299db9ec823/p2panda-rs-0.6.0/src/test_utils/db/test_db.rs:246:26
@sandreae, after some experimentation I thought it might be a good idea to keep this PR open for now and rather only merge the parts which fix the race conditions. You can find it here: #269 - What do you think? Is this a good idea?
I think it will make us get to what we want faster: Fix the annoying race conditions. The newly introduced feature very helpful and definitely more elegant, but is maybe not "solved" yet as we're both not happy with its API design. It doesn't give us much functionality outside of tests but already affects the API for other non-tests parts of the stack, making them harder to grasp. I'd rather merge the important bit for now and keep this open until we find something pretty!
Leaving your feature changes unmerged still keeps the second class of race conditions unfixed, but we can at least "control" them with bumping the waiting times. Also not beautiful, but it would come with no big changes for now.
We can keep this here open and make a new ticket which deals with introducing that feature? If you're happy with this I open #269 for review and create the ticket.
In any case, thanks so much for digging into all of that! What a ride!
Hey! Thanks for looking into this. I think what you're suggesting is a good move, it's a shame we might not use a lot of the code I did here, but maybe it was unavoidable, in any case it's good research which will feed into the next design phase!
In this and the new branch I sadly had still the following failing test (as I've mentioned on the chat):
thread 'db::stores::entry::tests::paginated_log_entries::case_6' panicked at 'Send to store: Custom("Entry's claimed seq num of 1 does not match expected seq num of 23 for given public key and log")', /home/adz/.cargo/registry/src/github.com-1ecc6299db9ec823/p2panda-rs-0.6.0/src/test_utils/db/test_db.rs:246:26
Hmm... interesting.... let me see if I can reproduce it.
In this and the new branch I sadly had still the following failing test (as I've mentioned on the chat):
thread 'db::stores::entry::tests::paginated_log_entries::case_6' panicked at 'Send to store: Custom("Entry's claimed seq num of 1 does not match expected seq num of 23 for given public key and log")', /home/adz/.cargo/registry/src/github.com-1ecc6299db9ec823/p2panda-rs-0.6.0/src/test_utils/db/test_db.rs:246:26
Hmm... interesting.... let me see if I can reproduce it.
Tried reproducing this but so far never seen a fail, ran it 20x or so.... :disappointed:
Let's move this conversation to the new PR though :+1:
I've created two new issues to tackle the separate topics:
- https://github.com/p2panda/aquadoggo/issues/271
- https://github.com/p2panda/aquadoggo/issues/270
... it's a shame we might not use a lot of the code I did here
This is definitely not going to waste, think we got some really good learnings from this which will help us to keep on working on it :+1:
I've closed this PR for now (keeping the branch
intact if we still need it). One reason is that it is stale, the other is that I think we can tackle this issue from another perspective: Rather than changing our code for better tests we can introduce new test utilities.
My suggestion would be to have a "timeout helper" which checks for a condition every x milliseconds and times out if that condition wasn't reached after y seconds. Like that we can test async conditions to arrive eventually. I've seen that pattern in other projects as well and think that this is the way to go for at least test-related problems.