python-driver
python-driver copied to clipboard
Fix executor shutdown
Usually Cluster.executor is getting shutdown after Cluster.scheduler, but in some cases executor is getting shutdown earlier, probably when Cluster.executor is manipulated directly or when process is getting killed and Cluster instance is still alive.
When it happens you will see this exception:
Traceback (most recent call last):
File "lib/python3.13/threading.py", line 1041, in _bootstrap_inner
self.run()
~~~~~~~~^^
File "cassandra/cluster.py", line 4429, in run
future = self._executor.submit(fn, *args, **kwargs)
File "lib/python3.13/concurrent/futures/thread.py", line 171, in submit
Error: raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
This PR makes it handle this case gracefully not scheduling anything and not failing.
Pre-review checklist
- [x] I have split my patch into logically separate commits.
- [x] All commit messages clearly explain what they change and why.
- [ ] ~~I added relevant tests for new features and bug fixes~~.
- [x] All commits compile, pass static checks and pass test.
- [x] PR description sums up the changes and reasons why they should be introduced.
- [x] I have provided docstrings for the public items that I want to introduce.
- [ ] ~~I have adjusted the documentation in
./docs/source/.~~ - [ ] ~~I added appropriate
Fixes:annotations to PR description.~~
What is the task that is being scheduled? I know about this error, but I didn't fix it that way before because it is a wrong way in my opinion. We should first end all tasks / threads (or whatever are they called here), and only after they are stopped the scheduler can be stopped. Ignoring this error is not fixing the issue but just masking it.
I envision it could hurt us in the future - perhaps the queries that were not finished will be forever waiting? Or there will be something that schedules a task, and awaits its result, causing a hang? I'd like to avoid issues similar to cassandra-stress being stuck on shutdown.
What is the task that is being scheduled? I know about this error, but I didn't fix it that way before because it is a wrong way in my opinion. We should first end all tasks / threads (or whatever are they called here), and only after they are stopped the scheduler can be stopped. Ignoring this error is not fixing the issue but just masking it.
I envision it could hurt us in the future - perhaps the queries that were not finished will be forever waiting? Or there will be something that schedules a task, and awaits its result, causing a hang? I'd like to avoid issues similar to cassandra-stress being stuck on shutdown.
It is actually already done, here _Scheduler waits for thread to complete on shutdown: https://github.com/scylladb/python-driver/blob/63110b0cb6098472029f16ce66667a96a97034c5/cassandra/cluster.py#L4399-L4407
And here it is Cluster shutdowns _Scheduler first and then shutdown executor: https://github.com/scylladb/python-driver/blob/63110b0cb6098472029f16ce66667a96a97034c5/cassandra/cluster.py#L1888-L1895
Which means that given issue can happen only when shutdown procedure does not go as it should.
Either when test or user manipulates with executor directly or when process is getting killed and Cluster instance is not shutdown properly.
Wait, so the error can happen in one of 3 conditions:
- There is some unknown bug in our code
- Test does something weird
- Process is killed
The third case is not important with regard to warnings, I think it is ok to print them. In the second case, I think the test should be fixed so that it does not produce the warning, instead of silencing the warning globally. Perhaps the test needs to do some weird stuff, but if that is the case maybe it can temporarily silence the warnings?
The first case is why the warning should not be ignored imo. If we fix second case, and don't kill the process, and still see the warning, then it indicates a bug which we should fix, right?
Wait, so the error can happen in one of 3 conditions:
- There is some unknown bug in our code
- Test does something weird
- Process is killed
The third case is not important with regard to warnings, I think it is ok to print them. In the second case, I think the test should be fixed so that it does not produce the warning, instead of silencing the warning globally. Perhaps the test needs to do some weird stuff, but if that is the case maybe it can temporarily silence the warnings?
The first case is why the warning should not be ignored imo. If we fix second case, and don't kill the process, and still see the warning, then it indicates a bug which we should fix, right?
Correct, Let's do this:
- Make it log exception and don't throw it
- Fix test case that manipulates with executor and see if this error appears on the CICD.
WDYT ?
Imo first lets fix the test cases - then we will see if there is any need to avoid throwing.
if I understand this one is a cosmetic change to reduce the those garbage from tests logs/output, right ? I don't think saw those in dtest or core tests, or got any complaints from users about it
Not saying we shouldn't attend to it, but just recapping what's the goal of this change.