pandarallel
pandarallel copied to clipboard
OSX can not parallel urllib
General
- Operating System: OSX 13.3.1 (22E261)
- Python version: 3.8.15
- Pandas version: 2.0.0
- Pandarallel version: 1.6.4
Acknowledgement
- [x] My issue is NOT present when using
pandas
without alone (withoutpandarallel
) - [x] If I am on Windows, I read the Troubleshooting page before writing a new bug report
Bug description
INFO: Pandarallel will run on 20 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
objc[48611]: +[NSNumber initialize] may have been in progress in another thread when fork() was called.
objc[48611]: +[NSNumber initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
Observed behavior
OSX can not use pandas with pandallel, and OSX has no /dev/shm therefore can not use memory file system to transfer data
Expected behavior
Codes work on Linux can run on OSX without errors.
Minimal but working code sample to ease bug fix for pandarallel
team
import urllib
import pandas as pd
from pandarallel import pandarallel
def func(item):
urllib.request.urlopen("http://www.python.org/")
df = pd.DataFrame({"data": range(20)})
pandarallel.initialize()
df["data"].parallel_apply(func)
Hi @SysuJayce,
please try setting the environment variable export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
. Then your example runs fine for me.
See this stackoverflow answer.
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
Thanks @till-m , Sorry for missing the info that I run this code in jupyter lab.
Setting env var and the above code works in shell, but in jupyter lab, the problem still exists
Hey @SysuJayce, could you try setting it using the env magic %env OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
?
%env OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
@till-m Sorry for the late reply. I have tried the env
magic, but unfortunately encounter the same error.
data:image/s3,"s3://crabby-images/634ba/634ba66ffef527731c60fc9bf5747d549dd54b47" alt="image"
Pandaral·lel is looking for a maintainer! If you are interested, please open an GitHub issue.
The code snippet works for me in Jupyter as expected, even when use_memory_fs=False
.
Python: 3.10.13 Pandarallel: 1.6.5 Pandas: 2.2.0 Jupyter core: 5.7.1 Jupyterlab: 4.1.0
I'm on Linux though.