pandarallel
pandarallel copied to clipboard
ValueError: Number of processes must be at least 1
[6887] Failed to execute script sbp
Traceback (most recent call last):
File "sbp.py", line 565, in
Same here. I'm running Python 3.8.5 in a Slurm environment (1 node, 16 cores) with a venv created from the following requirements.txt
:
appdirs==1.4.4
arch==4.19
beautifulsoup4==4.9.3
bs4==0.0.1
certifi==2020.12.5
chardet==4.0.0
cssselect==1.1.0
cycler==0.10.0
Cython==0.29.23
dill==0.3.3
fake-useragent==0.1.11
feedparser==6.0.2
idna==2.10
kiwisolver==1.3.1
lxml==4.6.3
matplotlib==3.4.2
numpy==1.20.3
pandarallel==1.5.2
pandas==1.2.4
parse==1.19.0
patsy==0.5.1
Pillow==8.2.0
property-cached==1.6.4
pyee==8.1.0
pyparsing==2.4.7
pyppeteer==0.2.5
pyquery==1.4.3
python-dateutil==2.8.1
pytz==2021.1
requests==2.25.1
requests-html==0.10.0
scipy==1.6.3
sgmllib3k==1.0.0
six==1.16.0
soupsieve==2.2.1
statsmodels==0.12.2
tqdm==4.61.0
urllib3==1.26.4
w3lib==1.22.0
websockets==8.1
yahoo-fin==0.8.8
The particular call was a df.groupby(...).parallel_apply(...)
. While testing it worked locally (4 cores) and also on the cluster. What could be wrong here? Do you need more information? Thanks in advance! 🙂
I just realized that I exceeded my disk quota. I will test whether that caused the issue tomorrow.
No, the issue persists. 🙁
Might be related to #115
Update: I was running my analysis on several data sets. For the one that got this error, some intermediate data frame was indeed empty.
Had this several times, check if df.empty==True. parallel_apply throws exceptions when the df's empty
This fails also when working with a single-row dataframe, which is a bit frustrating since I use those for debugging, but I don't want to clutter my code with "if n_rows 1 use apply else parallel_apply"
This fails also when working with a single-row dataframe, which is a bit frustrating since I use those for debugging, but I don't want to clutter my code with "if n_rows 1 use apply else parallel_apply"
Has the package been updated to address single-row dataframes? My code is littered with try-except blocks. It might be an easy solution to update the package code to embed a try-except
block, or at least check if n_rows==1
.
try:
paper_metadata = paper_metadata.parallel_apply(lambda x: x.astype(str))
except:
paper_metadata = paper_metadata.apply(lambda x: x.astype(str))
Hi @itang1,
I can't really take care of this at the moment, but I would gladly review a pull request.
@till-m is this fixed, can I take this up?
Hey @skamdar, by all means, feel free to take this up!
I am only able to reproduce this issue with dataframes having no columns and empty series. Failing examples:
# Empty series
series = pd.Series()
series.parallel_apply(lambda x: x**2)
# Empty dataframe
df = pd.DataFrame()
df.parallel_apply(lambda x: x**2)
Since this has been idle for some time, I decided to raise a fix #245.