pandarallel
pandarallel copied to clipboard
parallel_apply never starts processing
ISSUE: Progress on the parallel_apply
never starts going up.
I am trying to use parallel_apply
to populate new columns on a data frame.
This takes about 50 minutes with normal apply
, but every column is independent so it should be easily parallelizable.
I am using the following to initialize:
pandarallel.initialize(nb_workers=8, progress_bar=True, use_memory_fs=False)
OUTPUT:
INFO: Pandarallel will run on 8 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
and this is my parallel_apply
call:
allowed_types_list = ['...', '...', ..., '...']
data["allowed"] = data["type"].apply(lambda x: 1 if x in allowed_types_list else 0)
The shape of my dataframe is: (4717892, 8)
ISSUE: Progress on the parallel_apply
never starts going up.
I tried similarly on a different function that takes around 5 second on apply
, and same thing happens.
I tried it on my local computer (running MacOS with an i9, using pipe for data transfer) and on Google Colab (here I had 4 cores, using memory file system for data transfer). Same behavior on both.
Am I missing something?
As a side note, is it possible to get the progress bars working on Google Colab?
For your last question: https://stackoverflow.com/questions/64754814/pandarallel-widgets-dont-work-on-google-colab
@pablokvitca Could you try initializing without the progress_bar? I faced a similar issue and was able to run pandarallel without the progress_bar. If you are using jupyter notebook (since you were looking for colab), you can use the magic %time to see the time taken for the process.
pandarallel.initialize(nb_workers=8, use_memory_fs=False)
Thanks @MohitJuneja. Setting progress_bar=False fixed the issue for me. This is annoying though because the progress bars are extremely useful. I'm just running this in the terminal. Does anyone know why the progress bars cause the program to hang?
I am having the same issue; with progress bars I never actually get the processing to work (checking htop to see CPU usage, there's an immediate spike and then it all drops away). Turning off progress bars (a bummer) does let it work.
I'm facing the same problem on an M1 Macbook pro 13. Turning off progress bar doesn't help
Same problem here. Turning off the progress bar works.
It looks the problem starts with big dataframes. If I use less rows then the process (with progress bars) works.
Same issue. Any idea why
Same problem on M1 Pro.
Same issue using pandarallel==1.6.3
on Jupyter Notebook.
progress_bar=False
worked for me but it cause bad usability.
Same issue here using pandarallel==1.6.1, python 3.9.5 pandas 1.4.2. However I encounter this by finding out the cputime of the computation node stop increasing. And I set progress_bar=True, use_memory_fs=False.