aerospike-client-python
aerospike-client-python copied to clipboard
Are Scans Interruptable?
Greetings,
We're performing a very basic scan (using foreach
) over a large data set. Hitting CTRL-C does nothing once the scan has started. Is there a way to interrupt a scan in the client once it has started?
(I hesitate to use more drastic measures (like kill -9) for fear of tying up resources at the server.)
Update: Possibly related to GIL-release? http://stackoverflow.com/a/33652496/593047
Thanks!
We're investigating this, and so far it doesn't seem the GIL and grabbed and released incorrectly, but it would be good if you can post some reproducible code we can use to debug.
Thanks Ronen. Actually, I just wrote up a minimal example but it /did/ in fact stop when I hit CTRL-C. I'm looking into my original code now to see what might account for the different behavior. Will report back here later tonight, either way.
Found it: it happens only when concurrent=True
. Here's some code that demonstrates the problem (client v1.0.59).
import aerospike
from time import sleep
client = aerospike.client({'hosts': ['localhost']}).connect()
def cb_print((key, meta, bins)):
print bins
sleep(2)
scan = client.scan('mynamespace', 'myset')
scan.foreach(cb_print, policy={'timeout': 5000}, options={'concurrent': True})
CTRL-C won't stop that script. But change concurrent
to False
and CTRL-C works as expected.
We pre-released client 2.0.0 in this repo. Can you please clone and build it from source with python setup.py install
? We'd like to get some feedback ahead of publishing the release to PyPI. I'd like to see if it's still an issue.
Thanks, will check and report back.
Bad news: problem still exists in client 2.0.1.
Is it possible that the GIL is not released when concurrent=True
?
With concurrent=False
, observe how the heartbeats in a test program are interleaved with the scan callback:
-- heartbeat --
-- heartbeat --
starting scan
IN SCAN CALLBACK: {'bin1': 'val1'}
-- heartbeat --
-- heartbeat --
IN SCAN CALLBACK: {'bin1': 'val3'}
-- heartbeat --
-- heartbeat --
IN SCAN CALLBACK: {'bin1': 'val2'}
-- heartbeat --
-- heartbeat --
When concurrent=True
, we observe no heartbeats while the scan is in progress:
-- heartbeat --
-- heartbeat --
starting scan
IN SCAN CALLBACK: {'bin1': 'val1'}
IN SCAN CALLBACK: {'bin1': 'val3'}
IN SCAN CALLBACK: {'bin1': 'val2'}
There is no difference in the Python client GIL handling for concurrent=True
vs. False. The difference is how the underlying C client handles the call. For concurrent=True
, the C client uses one thread per node to perform the scan and waits until all threads complete. For concurrent=False
, the C client uses the current thread to loop through all the nodes in sequence.
So there is some interaction between the Python client and the C client in the multiple thread path of the code.
How many nodes do you have in your cluster?
I see--thanks for clarifying.
I'm running my test program against my dev cluster, which has just 1 node.
Does the C client call the foreach callback directly? Or is control passed back to the Python client layer for each record?
The callback goes from the C client -> Python client -> user code.
Returning false
from the user callback should halt the scan.
The callback goes from the C client -> Python client -> user code.
I see. Does the Python client release the GIL before calling the user code?
Returning false from the user callback should halt the scan.
Got it, thanks. Good to know; but just to be clear, that isn't related to this issue.
The Python client releases the GIL before calling the C client and then reacquires it in the callback from the C client before calling into user code.
Yeah, I was just trying to answer the "Are Scans Interruptable?" question in the "happy path". This is clearly not the happy path. :-)
Can you tell me whether I'm reading the code correctly? I'm looking here:
https://github.com/aerospike/aerospike-client-python/blob/master/src/main/scan/foreach.c#L58
and I see exactly what you described: the Python client callback function acquires the GIL before calling out to the Python user code callback.
But when I look here:
https://github.com/aerospike/aerospike-client-python/blob/master/src/main/scan/foreach.c#L156
it looks like the GIL is /not/ released/ before the Python client calls the C client's foreach function. My understanding is thus: The GIL is held by the Python client for the duration of the entire scan, not just for the duration of each call to the callback, as designed.
- I'm not sure I'm reading the code correctly. Is my assessment correct?
- Even if it is correct (i.e. there is a bug in the GIL handling during scans), is that the reason for the non-interruptible scans?
- Will correcting this bug (presumably by releasing the GIL before the call to
aerospike_scan_foreach
) cause a performance hit, since the GIL would then actually be acquired and released each time through the callback?
Thanks!
Line 153 releases the GIL: PyThreadState * _save = PyEval_SaveThread();
I will be looking at this along with the fork()
issue today. These a both somewhat complex interactions between the Python client and the C client so it may take a few days to resolve.
Ah, I thought that PyEval_SaveThread
didn't release the GIL, but you're right, I see now that it does. Thanks!