aerospike-client-python icon indicating copy to clipboard operation
aerospike-client-python copied to clipboard

Are Scans Interruptable?

Open RonRothman opened this issue 9 years ago • 15 comments

Greetings,

We're performing a very basic scan (using foreach) over a large data set. Hitting CTRL-C does nothing once the scan has started. Is there a way to interrupt a scan in the client once it has started?

(I hesitate to use more drastic measures (like kill -9) for fear of tying up resources at the server.)

Update: Possibly related to GIL-release? http://stackoverflow.com/a/33652496/593047

Thanks!

RonRothman avatar Feb 05 '16 05:02 RonRothman

We're investigating this, and so far it doesn't seem the GIL and grabbed and released incorrectly, but it would be good if you can post some reproducible code we can use to debug.

rbotzer avatar Feb 05 '16 21:02 rbotzer

Thanks Ronen. Actually, I just wrote up a minimal example but it /did/ in fact stop when I hit CTRL-C. I'm looking into my original code now to see what might account for the different behavior. Will report back here later tonight, either way.

RonRothman avatar Feb 07 '16 02:02 RonRothman

Found it: it happens only when concurrent=True. Here's some code that demonstrates the problem (client v1.0.59).

import aerospike
from time import sleep

client = aerospike.client({'hosts': ['localhost']}).connect()

def cb_print((key, meta, bins)):
    print bins
    sleep(2)

scan = client.scan('mynamespace', 'myset')
scan.foreach(cb_print, policy={'timeout': 5000}, options={'concurrent': True})

CTRL-C won't stop that script. But change concurrent to False and CTRL-C works as expected.

RonRothman avatar Feb 07 '16 03:02 RonRothman

We pre-released client 2.0.0 in this repo. Can you please clone and build it from source with python setup.py install? We'd like to get some feedback ahead of publishing the release to PyPI. I'd like to see if it's still an issue.

rbotzer avatar Feb 13 '16 02:02 rbotzer

Thanks, will check and report back.

RonRothman avatar Feb 18 '16 04:02 RonRothman

Bad news: problem still exists in client 2.0.1.

RonRothman avatar Mar 08 '16 19:03 RonRothman

Is it possible that the GIL is not released when concurrent=True?

With concurrent=False, observe how the heartbeats in a test program are interleaved with the scan callback:

  -- heartbeat --
  -- heartbeat --
starting scan
IN SCAN CALLBACK: {'bin1': 'val1'}
  -- heartbeat --
  -- heartbeat --
IN SCAN CALLBACK: {'bin1': 'val3'}
  -- heartbeat --
  -- heartbeat --
IN SCAN CALLBACK: {'bin1': 'val2'}
  -- heartbeat --
  -- heartbeat --

When concurrent=True, we observe no heartbeats while the scan is in progress:

  -- heartbeat --
  -- heartbeat --
starting scan
IN SCAN CALLBACK: {'bin1': 'val1'}
IN SCAN CALLBACK: {'bin1': 'val3'}
IN SCAN CALLBACK: {'bin1': 'val2'}

RonRothman avatar Mar 08 '16 19:03 RonRothman

There is no difference in the Python client GIL handling for concurrent=True vs. False. The difference is how the underlying C client handles the call. For concurrent=True, the C client uses one thread per node to perform the scan and waits until all threads complete. For concurrent=False, the C client uses the current thread to loop through all the nodes in sequence.

So there is some interaction between the Python client and the C client in the multiple thread path of the code.

How many nodes do you have in your cluster?

jboone100 avatar Mar 08 '16 20:03 jboone100

I see--thanks for clarifying.

I'm running my test program against my dev cluster, which has just 1 node.

Does the C client call the foreach callback directly? Or is control passed back to the Python client layer for each record?

RonRothman avatar Mar 09 '16 01:03 RonRothman

The callback goes from the C client -> Python client -> user code.

Returning false from the user callback should halt the scan.

jboone100 avatar Mar 09 '16 20:03 jboone100

The callback goes from the C client -> Python client -> user code.

I see. Does the Python client release the GIL before calling the user code?

Returning false from the user callback should halt the scan.

Got it, thanks. Good to know; but just to be clear, that isn't related to this issue.

RonRothman avatar Mar 09 '16 23:03 RonRothman

The Python client releases the GIL before calling the C client and then reacquires it in the callback from the C client before calling into user code.

Yeah, I was just trying to answer the "Are Scans Interruptable?" question in the "happy path". This is clearly not the happy path. :-)

jboone100 avatar Mar 10 '16 02:03 jboone100

Can you tell me whether I'm reading the code correctly? I'm looking here:

https://github.com/aerospike/aerospike-client-python/blob/master/src/main/scan/foreach.c#L58

and I see exactly what you described: the Python client callback function acquires the GIL before calling out to the Python user code callback.

But when I look here:

https://github.com/aerospike/aerospike-client-python/blob/master/src/main/scan/foreach.c#L156

it looks like the GIL is /not/ released/ before the Python client calls the C client's foreach function. My understanding is thus: The GIL is held by the Python client for the duration of the entire scan, not just for the duration of each call to the callback, as designed.

  1. I'm not sure I'm reading the code correctly. Is my assessment correct?
  2. Even if it is correct (i.e. there is a bug in the GIL handling during scans), is that the reason for the non-interruptible scans?
  3. Will correcting this bug (presumably by releasing the GIL before the call to aerospike_scan_foreach) cause a performance hit, since the GIL would then actually be acquired and released each time through the callback?

Thanks!

RonRothman avatar Mar 10 '16 16:03 RonRothman

Line 153 releases the GIL: PyThreadState * _save = PyEval_SaveThread();

I will be looking at this along with the fork() issue today. These a both somewhat complex interactions between the Python client and the C client so it may take a few days to resolve.

jboone100 avatar Mar 10 '16 16:03 jboone100

Ah, I thought that PyEval_SaveThread didn't release the GIL, but you're right, I see now that it does. Thanks!

RonRothman avatar Mar 10 '16 17:03 RonRothman