pandarallel Processes stopped when passing large objects to function to be parallelized

Problem:

Apply a NLP Deep Learning model for Text Geneartion over the rows of a Pandas Series. The function call is:

out = text_column.parallel_apply(lambda x: generate_text(args, model, tokenizer, x))

where args, tokenizer are light objects but model is a heavy object, storing a Pytorch model which weighs more than 6GB on secondary memory and takes up ~12GB RAM when running it.

I have been doing some tests and the problem arises only when I pass the heavy model to the function (even without effectively running it inside the function), so it seems that the problem is passing an object as argument that takes up a lot of memory. (Maybe related with the Sharing Memory strategy for parallel computing.)

After running the parallel_apply the output I get is:

INFO: Pandarallel will run on 8 workers.
INFO: Pandarallel will use standard multiprocessing data tranfer (pipe) to transfer data between the main process and workers.
   0.00%                                          |        0 /      552 |
   0.00%                                          |        0 /      552 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |

And it gets stuck there forever. Indeed, there are two processed spawned and both are stopped:

ablanco+  85448  0.0  4.9 17900532 12936684 pts/27 Sl 14:41   0:00 python3 text_generation.py --input_file input.csv --model_type gpt2  --output_file out.csv --no_cuda --n_cpu 8
ablanco+  85229 21.4 21.6 61774336 57023740 pts/27 Sl 14:39   2:26 python3 text_generation.py --input_file input.csv --model_type gpt2  --output_file out.csv --no_cuda --n_cpu 8

Jan 14 '20 13:01 alberduris

Hello,

First could tell me if this issue arises also with classical pandas (if no, we are sure it is exclusively a pandarallel issue)
Could you also please try without progress bar and without using memory filesystem ? (pandarallel.initialize(use_memory_fs=False)).

I guess it won't work, but maybe it could give me more information about the topic.

Actually, to serialize lambda functions, pandarallel uses dill. Because dill is very slow compared to classical Python serialisation, pandarallel uses dill only to serialise the function to apply, the rest (dataframe and all) are serialized with standard Python serialisation.

But, unfortunately in your case, the function to apply is huge, because it contains model.

Could you also tell me how much RAM do you have, the RAM usage during your pandarallel call. And if you have time, could you try with only 2 workers ? (or even 1 worker. Of course 1 worker is useless compared to classical pandas, but at least it uses pandarallel mechanism).

My guesses are the following :

Either pandarallel is working, but the serialization of your model takes a long time, so the function to apply is not yet totally received by worker processes (progress bars really begins to go when some data are treated by workers. During (de)serialization they stay to 0%)
Either you run out of memory. pandarallel is optimized to consume as few RAM as possible concerning the dataframe, but the function to apply is copied n times in memory if you have n worker. Usually the function itself is very light.

Jan 14 '20 14:01 nalepae

Hi @nalepae, thank you for your detailed and fast answer.

First could tell me if this issue arises also with classical pandas (if no, we are sure it is exclusively a pandarallel issue)

Yes, if I replace the parallel_apply function with the standard apply function everything works correctly (but slow)

Could you also please try without progress bar and without using memory filesystem ? (pandarallel.initialize(use_memory_fs=False)).

Thank for the suggestions. Same behaviour.

Could you also tell me how much RAM do you have, the RAM usage during your pandarallel call.

This is the output of free -m during the pandarallel call. I think that free RAM is not the problem.

              total        used        free      shared  buff/cache   available
Mem:         257672       52909        9537          51      195225      203649
Swap:          4095         590        3505

And if you have time, could you try with only 2 workers ? (or even 1 worker. Of course 1 worker is useless compared to classical pandas, but at least it uses pandarallel mechanism).

I have just tried setting nb_workers=1 and nothing changes.

INFO: Pandarallel will run on 1 workers.
INFO: Pandarallel will use standard multiprocessing data tranfer (pipe) to transfer data between the main process and workers.
   0.00%                                          |        0 /        6 |

Please, tell me whatever you need and thanks again.

Jan 14 '20 15:01 alberduris

Also ran into the issue, took forever to debug, as the argument itself was actually part of self ... Still have lots of RAM, so the serialization guess seems to be spot on, considering that at KeyboardInterrupt the Traceback mostly goes into dill and pickle.

Here is reproducible code:

import numpy as np
import pandas as pd
from pandarallel import pandarallel

pandarallel.initialize(nb_workers=1, use_memory_fs=False)

class A:
    def __init__(self, var1):
        self.var1 = var1

    def f(self, *args):
        pass

    def run(self):
        df = pd.DataFrame(dict(a=np.random.rand(100)))
        df.apply(lambda x: self.f(x), axis=1)
        print("apply is ok")
        df.parallel_apply(lambda x: self.f(x), axis=1)  # hangs if self.var1 is too big
        print("parallel is ok")

if __name__=="__main__":
    a_list = [1]*1024*1024*1024
    a = A(a_list)
    a.run()

Produces:

INFO: Pandarallel will run on 1 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
apply is ok

And hangs...

Appreciate your work! @nalepae

Jan 21 '20 13:01 tehkirill

Currently fixed by upgrading python to 3.7.6 from 3.7.4, apparently the problem was with pickle.

Feb 09 '20 17:02 biebiep

For those who seek why a single process is running indefinitely with no results: I was on 3.6.4 and upgrading to 3.7.6 fixed the issue. Still no luck with progress bars, sadly.

Jan 28 '21 20:01 Lolologist

I got around this by setting the function parameters to global variables.

May 04 '22 09:05 tshu-w

pandarallel pandarallel copied to clipboard

Processes stopped when passing large objects to function to be parallelized

pandarallel
pandarallel copied to clipboard