pandarallel ValueError: Number of processes must be at least 1

[6887] Failed to execute script sbp Traceback (most recent call last): File "sbp.py", line 565, in File "pandarallel/pandarallel.py", line 447, in closure File "multiprocessing/context.py", line 119, in Pool File "multiprocessing/pool.py", line 169, in init ValueError: Number of processes must be at least 1

Apr 23 '21 07:04 ztsweet

Same here. I'm running Python 3.8.5 in a Slurm environment (1 node, 16 cores) with a venv created from the following requirements.txt:

appdirs==1.4.4
arch==4.19
beautifulsoup4==4.9.3
bs4==0.0.1
certifi==2020.12.5
chardet==4.0.0
cssselect==1.1.0
cycler==0.10.0
Cython==0.29.23
dill==0.3.3
fake-useragent==0.1.11
feedparser==6.0.2
idna==2.10
kiwisolver==1.3.1
lxml==4.6.3
matplotlib==3.4.2
numpy==1.20.3
pandarallel==1.5.2
pandas==1.2.4
parse==1.19.0
patsy==0.5.1
Pillow==8.2.0
property-cached==1.6.4
pyee==8.1.0
pyparsing==2.4.7
pyppeteer==0.2.5
pyquery==1.4.3
python-dateutil==2.8.1
pytz==2021.1
requests==2.25.1
requests-html==0.10.0
scipy==1.6.3
sgmllib3k==1.0.0
six==1.16.0
soupsieve==2.2.1
statsmodels==0.12.2
tqdm==4.61.0
urllib3==1.26.4
w3lib==1.22.0
websockets==8.1
yahoo-fin==0.8.8

The particular call was a df.groupby(...).parallel_apply(...). While testing it worked locally (4 cores) and also on the cluster. What could be wrong here? Do you need more information? Thanks in advance! 🙂

Jul 08 '21 22:07 jonas-schulze

I just realized that I exceeded my disk quota. I will test whether that caused the issue tomorrow.

Jul 08 '21 22:07 jonas-schulze

No, the issue persists. 🙁

Jul 09 '21 13:07 jonas-schulze

Might be related to #115

Update: I was running my analysis on several data sets. For the one that got this error, some intermediate data frame was indeed empty.

Jul 09 '21 14:07 jonas-schulze

Had this several times, check if df.empty==True. parallel_apply throws exceptions when the df's empty

Mar 30 '22 23:03 ningkko

This fails also when working with a single-row dataframe, which is a bit frustrating since I use those for debugging, but I don't want to clutter my code with "if n_rows 1 use apply else parallel_apply"

Apr 11 '22 10:04 pablogps

This fails also when working with a single-row dataframe, which is a bit frustrating since I use those for debugging, but I don't want to clutter my code with "if n_rows 1 use apply else parallel_apply"

Has the package been updated to address single-row dataframes? My code is littered with try-except blocks. It might be an easy solution to update the package code to embed a try-except block, or at least check if n_rows==1.

try:
    paper_metadata = paper_metadata.parallel_apply(lambda x: x.astype(str))
except:
    paper_metadata = paper_metadata.apply(lambda x: x.astype(str))

Nov 30 '22 02:11 itang1

Hi @itang1,

I can't really take care of this at the moment, but I would gladly review a pull request.

Nov 30 '22 09:11 till-m

@till-m is this fixed, can I take this up?

Jun 12 '23 10:06 skamdar

Hey @skamdar, by all means, feel free to take this up!

Jun 12 '23 13:06 till-m

I am only able to reproduce this issue with dataframes having no columns and empty series. Failing examples:

# Empty series
series = pd.Series()
series.parallel_apply(lambda x: x**2)

# Empty dataframe
df = pd.DataFrame()
df.parallel_apply(lambda x: x**2)

Since this has been idle for some time, I decided to raise a fix #245.

Jun 23 '23 20:06 Mithil467

pandarallel pandarallel copied to clipboard

ValueError: Number of processes must be at least 1

pandarallel
pandarallel copied to clipboard