python-driver Fix executor shutdown

Usually Cluster.executor is getting shutdown after Cluster.scheduler, but in some cases executor is getting shutdown earlier, probably when Cluster.executor is manipulated directly or when process is getting killed and Cluster instance is still alive.

When it happens you will see this exception:

Traceback (most recent call last):
  File "lib/python3.13/threading.py", line 1041, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "cassandra/cluster.py", line 4429, in run
    future = self._executor.submit(fn, *args, **kwargs)
  File "lib/python3.13/concurrent/futures/thread.py", line 171, in submit
Error:   raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown

This PR makes it handle this case gracefully not scheduling anything and not failing.

Pre-review checklist

[x] I have split my patch into logically separate commits.
[x] All commit messages clearly explain what they change and why.
[ ] ~~I added relevant tests for new features and bug fixes~~.
[x] All commits compile, pass static checks and pass test.
[x] PR description sums up the changes and reasons why they should be introduced.
[x] I have provided docstrings for the public items that I want to introduce.
[ ] ~~I have adjusted the documentation in ./docs/source/.~~
[ ] ~~I added appropriate Fixes: annotations to PR description.~~

Dec 19 '24 23:12 dkropachev

What is the task that is being scheduled? I know about this error, but I didn't fix it that way before because it is a wrong way in my opinion. We should first end all tasks / threads (or whatever are they called here), and only after they are stopped the scheduler can be stopped. Ignoring this error is not fixing the issue but just masking it.

I envision it could hurt us in the future - perhaps the queries that were not finished will be forever waiting? Or there will be something that schedules a task, and awaits its result, causing a hang? I'd like to avoid issues similar to cassandra-stress being stuck on shutdown.

Dec 20 '24 21:12 Lorak-mmk

What is the task that is being scheduled? I know about this error, but I didn't fix it that way before because it is a wrong way in my opinion. We should first end all tasks / threads (or whatever are they called here), and only after they are stopped the scheduler can be stopped. Ignoring this error is not fixing the issue but just masking it.

I envision it could hurt us in the future - perhaps the queries that were not finished will be forever waiting? Or there will be something that schedules a task, and awaits its result, causing a hang? I'd like to avoid issues similar to cassandra-stress being stuck on shutdown.

It is actually already done, here _Scheduler waits for thread to complete on shutdown: https://github.com/scylladb/python-driver/blob/63110b0cb6098472029f16ce66667a96a97034c5/cassandra/cluster.py#L4399-L4407

And here it is Cluster shutdowns _Scheduler first and then shutdown executor: https://github.com/scylladb/python-driver/blob/63110b0cb6098472029f16ce66667a96a97034c5/cassandra/cluster.py#L1888-L1895

Which means that given issue can happen only when shutdown procedure does not go as it should. Either when test or user manipulates with executor directly or when process is getting killed and Cluster instance is not shutdown properly.

Dec 21 '24 19:12 dkropachev

Wait, so the error can happen in one of 3 conditions:

There is some unknown bug in our code
Test does something weird
Process is killed

The third case is not important with regard to warnings, I think it is ok to print them. In the second case, I think the test should be fixed so that it does not produce the warning, instead of silencing the warning globally. Perhaps the test needs to do some weird stuff, but if that is the case maybe it can temporarily silence the warnings?

The first case is why the warning should not be ignored imo. If we fix second case, and don't kill the process, and still see the warning, then it indicates a bug which we should fix, right?

Dec 23 '24 14:12 Lorak-mmk

Wait, so the error can happen in one of 3 conditions:

There is some unknown bug in our code

Test does something weird

Process is killed

The third case is not important with regard to warnings, I think it is ok to print them. In the second case, I think the test should be fixed so that it does not produce the warning, instead of silencing the warning globally. Perhaps the test needs to do some weird stuff, but if that is the case maybe it can temporarily silence the warnings?

The first case is why the warning should not be ignored imo. If we fix second case, and don't kill the process, and still see the warning, then it indicates a bug which we should fix, right?

Correct, Let's do this:

Make it log exception and don't throw it
Fix test case that manipulates with executor and see if this error appears on the CICD.

WDYT ?

Dec 23 '24 15:12 dkropachev

Imo first lets fix the test cases - then we will see if there is any need to avoid throwing.

Dec 23 '24 19:12 Lorak-mmk

if I understand this one is a cosmetic change to reduce the those garbage from tests logs/output, right ? I don't think saw those in dtest or core tests, or got any complaints from users about it

Not saying we shouldn't attend to it, but just recapping what's the goal of this change.

Dec 24 '24 07:12 fruch

python-driver python-driver copied to clipboard

Fix executor shutdown

Pre-review checklist

python-driver
python-driver copied to clipboard