zebra
zebra copied to clipboard
feat(test): Check that the state service handles errors correctly when committing finalized blocks
Motivation
We recently refactored the process of committing blocks to the finalized state. This PR tests if Zebra handles errors correctly.
Solution
In this PR, we trigger a failure when committing a block, and make sure the state service handles it as expected.
Reviewer Checklist
- [ ] Will the PR name make sense to users?
- [ ] Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
- [ ] Are the PR labels correct?
- [ ] Does the code do what the ticket and PR says?
- [ ] How do you know it works? Does it have tests?
Follow Up Work
We might need to do a similar test for the non-finalized state as well.
The new test works well locally.
The new test works well locally.
The new test also works ok on my machine.
But I discovered that it hangs when there is only 1 CPU available. Which should be OK, some tests require multiple cores.
But maybe it also hangs in CI when there are only 2 virtual CPU cores available? That might be a timing issue.
Here's how I tested it with just one core:
$ taskset 0x01 cargo test -- state_handles_error_correctly
I'm really not sure how to make progress on this one. We could update the ticket with the progress we made, and close the PR?
(Sorry, I realise this has happened a bit to your work lately, I wish I could be more helpful!)
We could also disable the test by default in CI using #[ignore]
, and merge it?
Then it will run locally when people ask for it, but it won't break CI or other weird test environments.
But maybe it also hangs in CI when there are only 2 virtual CPU cores available? That might be a timing issue.
I don't think it's a timing issue because it consistently it hangs in CI and passes locally.
We could create an issue for when larger runners become available, they're currently in public beta for enterprise: https://github.blog/changelog/2022-09-01-github-actions-larger-runners-are-now-in-public-beta
We could add some logging to each line in each thread, and see there it hangs in CI? But that could be a lot of work for a low-priority test.
@mpguerra what do you think? Should we work on this sync test more, or just close the PR and update the ticket with this draft branch?
@mpguerra what do you think? Should we work on this sync test more, or just close the PR and update the ticket with this draft branch?
Let's just close the PR and update with draft branch. Sorry @upbqdn :(
I discovered that it hangs when there is only 1 CPU available.
You're right, running
taskset 0x1 cargo test -- state_handles_error_correctly
dead-locks since there' no CPU activity. However, running the test on two CPUs like this
taskset 0x3 cargo test -- state_handles_error_correctly
works fine. I was trying to find out where the dead-lock happens, but I didn't manage to figure it out.
Let's just close the PR and update with draft branch.
Sure, I also referenced this PR in ticket #2654 so that it's easy to find the description and the comments. I also updated the description of this PR.