opentelemetry-python
opentelemetry-python copied to clipboard
Try to understand and fix test_shutdown_wait_last_export
Description
TL;DR The time assert in the test was wrong.
Looking at the original commit (c84ba9) of this test, the delay of the mocked backoff was 4 seconds fixed with 6 iterations. So the time until shutdown returned was around 24 seconds. The default timeout of the lock wait in shutdown did not hit before the retry of the exporter was over.
The line self.assertGreaterEqual(now, (start_time + 30 / 1000))
seemed to be wrong from the beginning. It just ensured that the shutdown call took at least 30ms. I assume it was supposed to do something else. But to me it's not clear what exactly it should do.
One explanation would be that it was intended to do self.assertGreaterEqual(now, (start_time + 30 * 1000))
instead. But that would mean: Shutdown should need at least 30 seconds to pass, which does not make sense imho.
Another option would be that it was intended to do self.assertLessEqual(now, (start_time + 30 * 1000))
instead. That would mean: Shutdown should need at maximum 30 seconds to pass, which actually could make more sense. It would ensure that the call of self.exporter.shutdown
returns after 30 seconds.
My biggest issue here is, that it tests the happy path where the retry is over before the timeout while there is no mechanism to ensure that the exporter thread is stopped in a sensible time in case it's current exponential backoff is in a longer sleep already while the shutdown flag is set. If I understand it correctly, it can hit the beginning of a 32 second sleep, which could mean that a process needs a) 30 seconds to signal the shutdown and additional 32 seconds to finish the exporter thread. But that's something for another change.
After all these words, looking at the current HEAD, the situation of the test case is basically the same as originally. The retry interval has been changed recently to be way shorter. The default lock timeout stayed the same at 30 seconds, but that does not matter too much since the retry mechanism is finished long before that. And the aforementioned assert is still wrong, with some additional problem from #4014.
Fixes the issue mentioned here: https://github.com/open-telemetry/opentelemetry-python/pull/4014#discussion_r1668929743
Type of change
Please delete options that are not relevant.
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
- [x] Run the changed tests
Does This PR Require a Contrib Repo Change?
- [ ] Yes. - Link to PR:
- [x] No.
Checklist:
- [x] Followed the style guidelines of this project
- [ ] Changelogs have been updated
- [ ] Unit tests have been added
- [ ] Documentation has been updated