ipyparallel icon indicating copy to clipboard operation
ipyparallel copied to clipboard

Why load_balanced_view use so many memory?

Open GF-Huang opened this issue 4 years ago • 14 comments

image

GF-Huang avatar Jul 28 '21 03:07 GF-Huang

The IPython client is spending all its time serializing 100k messages here. The main thing is that a load balanced view creates one message per item by default, which is 100k tasks in this case. That's a lot! The second is that IPython has special-handling of numpy arrays, preserving the arguments as numpy arrays through serialization, and reconstructing the result as a numpy array. That means that IPython is creating 100,000 single-field numpy arrays to send and then serializing them. This saves a lot when sending large arrays, but costs a lot when sending a very large number of tiny arrays. It's not sending 1 100k times, it's sending np.ndarray([1]) 100k times. That's where ~all the time and memory is being spent.

You'll see much better behavior if you don't use a numpy array for this very simple case, or if you use e.g. chunksize=1000 to reduce the number of messages.

You might also consider the new LoadBalancedView.imap which sends messages more efficiently when you have a very large input stream by only submitting a limited number of messages and waiting for results before preparing and serializing more.

minrk avatar Aug 02 '21 10:08 minrk

How to wait_interactivefor imap result?

GF-Huang avatar Aug 04 '21 07:08 GF-Huang

imap is a generator, unilke other AsyncResult objects, so AsyncResult methods are not available. You can use tqdm directly, if you like:

source = range(1024)
gen = view.imap(lambda x: x, source)
for result in tqdm.tqdm(gen, total=len(source)):
    ...

minrk avatar Aug 04 '21 13:08 minrk

I don't know why it take a long time but no progress.

image

You can see it only take 30+ms at local per combination.

image

GF-Huang avatar Aug 05 '21 02:08 GF-Huang

After you cancel, what do you get for c.queue_status()?

minrk avatar Aug 05 '21 06:08 minrk

It has been more than 1 minute so far and it is still stucks.

image

GF-Huang avatar Aug 05 '21 06:08 GF-Huang

I take a look the Windows Task Manager, python no CPU usage, but it still stucks.

GF-Huang avatar Aug 05 '21 07:08 GF-Huang

Do simple executions work? rc[:].apply_sync(os.getpid) and view.apply_sync(os.getpid)?

minrk avatar Aug 05 '21 07:08 minrk

It still stucks, I think I should restart the kernel.

image

GF-Huang avatar Aug 05 '21 07:08 GF-Huang

After cancel, it still stucks, perhaps because cluster has been released by with ... as ...?

image

GF-Huang avatar Aug 05 '21 07:08 GF-Huang

Can you call getpid before the code that's causing the problem?

minrk avatar Aug 05 '21 08:08 minrk

Seems works well.

image

GF-Huang avatar Aug 05 '21 08:08 GF-Huang

Does map work with a simpler operation (start with echo, maybe return the same data type as your real task)? I'm trying to isolate the issue. A hang is certainly weird unless the processes are actually stuck on something. It's very strange that queue_status() would hang, since that doesn't involve the engines at all. Make sure you are calling that within the context manager if you are using it, or otherwise while the client's connection is still open and the controller still running.

minrk avatar Aug 06 '21 07:08 minrk

It seems very slow. My machine has 24 cores.

image

GF-Huang avatar Aug 06 '21 12:08 GF-Huang