pandarallel icon indicating copy to clipboard operation
pandarallel copied to clipboard

Setting progress_bar=True freezes execution for parallel_apply before reaching 1% completion on all CPU's

Open abhineetgupta opened this issue 4 years ago • 19 comments

When progress_bar=True, I noticed that the execution of my parallel_apply task stopped right before all parallel processes reached 1% progress mark. Here are some further details of what I was encountering -

  • I turned on logging with DEBUG messages, but no messages were displayed when the execution stopped. There were no error messages either. The dataframe rows simply stopped processing further and the process seemed to be frozen.
  • I have two CPU's. It seems that the progress bar only updates in 1% increments. One of the progress bars reaches 1% mark, but when the number of processed rows reaches the 2% mark (which I assume is associated with the second progress bar updating to 1% as well), that's when the process froze.
  • The process runs fine with progress_bar=False.

abhineetgupta avatar Jan 28 '21 20:01 abhineetgupta

Similar issue here, except that once one process reaches 100% all others get stuck at 99.99%. Problem is completely fixed by turning off the progress bars (but I don't look quite leet enough /s).

Specs:

  • SageMaker ml.m5.4xl
  • Data ~2.6M rows
  • Using parallel_apply with a function that transforms sentences to tokens, lemmatizes, and then checks for the presence of a token.

bmacher-discovery avatar Feb 05 '21 21:02 bmacher-discovery

Same issue ^ 21M rows python 3.8, OSX 10.15.7,

I'm running parallel_apply, and 2 out of 12 bars finish, the others get stuck and I'm getting a "python quit unexpectedly" error from the os

Ronserruya avatar Feb 11 '21 13:02 Ronserruya

Similar issue and i'm only working on about 12k rows. It seems to get to about 300 completed items on each core then all of the forked processes just seem to die - almost like it's trying to create new threads but then it just sits there, all cores basically unused.

Python 3.6.9 on Ubuntu-18.04 WSL2

** Edit** I removed the enable for progress_bar in my little console application, and it seems that whatever deadlock is occurring has disappeared, it seems to be progressing pretty well

chris-forbes avatar Feb 23 '21 09:02 chris-forbes

Same issue here, I set the number of workers to 12 but 2 of them stopped with 1% progress.

zkx06111 avatar Feb 26 '21 12:02 zkx06111

I have the same issue, working on 111k rows, Python 3.8.

CptPirx avatar May 12 '21 13:05 CptPirx

Same here. None of the processes make any progress.

I use parallel_apply on a groupby. It seems that the length of the groups is also not correctly recognized for the progress bar.

skwde avatar May 21 '21 09:05 skwde

Same, is there any workaround for it?

quancore avatar Jun 12 '21 16:06 quancore

Same, is there any workaround for it?

Setting progress_bar=False worked for me.

abhineetgupta avatar Jun 12 '21 17:06 abhineetgupta

also experiencing this issue

Python 3.8 pandarallel 1.5.2 centos ~500k rows

happens both at all <1% and sometimes at most >99%

the workaround progress_bar=False also works for me, but it would be nice to have :)

neontty avatar Sep 02 '21 17:09 neontty

This happens to me too, but the workaround works.

kylegilde avatar Nov 30 '21 21:11 kylegilde

Same here in a ".parallel_apply(lambda)" Froze here: image

Lucas-Servi avatar Mar 04 '22 17:03 Lucas-Servi

Could you please tell me the version of pandarallel you are using?

nalepae avatar Mar 04 '22 17:03 nalepae

Name: pandarallel Version: 1.5.5

Lucas-Servi avatar Mar 04 '22 17:03 Lucas-Servi

Could you please try with Pandarallel 1.5.7?

Le ven. 4 mars 2022 à 18:26, Lucas Servi @.***> a écrit :

Name: pandarallel Version: 1.5.5

— Reply to this email directly, view it on GitHub https://github.com/nalepae/pandarallel/issues/131#issuecomment-1059368082, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFW7VWOFRGOQEQFGPXSD3TU6JBS7ANCNFSM4WXWW2UA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

nalepae avatar Mar 04 '22 17:03 nalepae

Sure, give a min... just for the record, the execution comes to a moment where cores stop working while the cell is still running: image

Lucas-Servi avatar Mar 04 '22 17:03 Lucas-Servi

Sorry, I don't get if your issue is fixed with Pandarallel 1.5.7. If no, could you please provide:

  • Operating System:
  • Python version:
  • Pandas version:
  • Pandarallel version: and a minimal code sample which reproduce the issue for me to investigate?

nalepae avatar Mar 04 '22 17:03 nalepae

image Stopped here.

Operating System: Linux Mint 20.3
Kernel: Linux 5.13.0-27-generic
Python version: Python 3.9.5
Pandas version: 1.4.1
Pandarallel version: 1.5.7

I made a little folder with code + 2 dataframes used. https://easyupload.io/w9mbcv

Hope it helps!

Thanks for Pandarallel, it's amazing :)!

Lucas-Servi avatar Mar 04 '22 18:03 Lucas-Servi

Hello,

I do reproduce your issue with pandarallel 1.5.5, but I do not reproduce your issue with pandarallel v1.5.7. Are you totally sure you tried it with pandarallel 1.5.7?

To know the current version of pandarallel you are using:

import pandarallel

pandarallel.__version__

To be sure you install the last version of pandarallel:

pip install pandallel --upgrade

(I guess you are not using pandarallel v1.5.7, since this version of pandarallel only uses by default the half of available CPUs. I see on your htop screenshot you have 16 CPUs and you have also 16 progress bars.)

nalepae avatar Mar 07 '22 10:03 nalepae

Yes, but I`m testing it on 8 or 4 cores now and still not working. This was my best shot after clean install in a new env. image

Running on 1.5.7 It usually runs perfectly, i just had trouble with this particular script. Thanks for the support, i'm going to try something different. :)

Lucas-Servi avatar Mar 07 '22 13:03 Lucas-Servi

I'm assuming this has been fixed.

till-m avatar Sep 12 '22 14:09 till-m

@nalepae @till-m I am still encountering this issue both in version 1.5.7 and 1.6.3. Some cores fail to progress freeze both with progress_bar=True and progress_bar=False

parthpankajtiwary avatar Dec 08 '22 22:12 parthpankajtiwary

@nalepae @till-m I am still encountering this issue both in version 1.5.7 and 1.6.3. Some cores fail to progress freeze both with progress_bar=True and progress_bar=False

I got it to work. Couple of observations:

  • I was working in Windows - so anything prior to multiprocessing that touches cuda drivers will not sit well with multiprocessing. In my case I was importing cudf, I separated the logic.
  • I was passing a model (700 MB) as an argument to the function supplied to parallel_apply, that seems to have been a bottleneck. As a work around, I have initialised the model as a global variable instead of passing it to the function and it seems to have worked fine.

parthpankajtiwary avatar Dec 16 '22 18:12 parthpankajtiwary

I am still getting this issue on pandarallel 1.6.5. If I set progress_bars = False I don't get any issues, but would be great to be able to use this feature.

Using parallel_apply() it just hangs here - and the data table I am using here is tiny for testing (~1 MB) image

I am using M2 mac but think that should be fine from what I can see on the docs.

LukebethamStonehaven avatar Jul 05 '23 21:07 LukebethamStonehaven

Hi @LukebethamStonehaven,

can you consistently reproduce the problem like this? If yes, can you send me an SSCCE?

till-m avatar Jul 06 '23 09:07 till-m

I am facing a similar issue of parallel_apply() freezing when running my code on an EC2 cluster. It was working fine up till a few days back, everday on a schedule, but suddenly it has stopped working. On running the same code on my local machine it is working alright though. I have also kept progress_bar=False. My pandarallel version is v1.6.4 in both local & EC2. Any ideas guys?

sahil-zepto avatar Mar 06 '24 13:03 sahil-zepto