pandarallel icon indicating copy to clipboard operation
pandarallel copied to clipboard

OSX can not parallel urllib

Open SysuJayce opened this issue 1 year ago • 6 comments

General

  • Operating System: OSX 13.3.1 (22E261)
  • Python version: 3.8.15
  • Pandas version: 2.0.0
  • Pandarallel version: 1.6.4

Acknowledgement

  • [x] My issue is NOT present when using pandas without alone (without pandarallel)
  • [x] If I am on Windows, I read the Troubleshooting page before writing a new bug report

Bug description

INFO: Pandarallel will run on 20 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
objc[48611]: +[NSNumber initialize] may have been in progress in another thread when fork() was called.
objc[48611]: +[NSNumber initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

Observed behavior

OSX can not use pandas with pandallel, and OSX has no /dev/shm therefore can not use memory file system to transfer data

Expected behavior

Codes work on Linux can run on OSX without errors.

Minimal but working code sample to ease bug fix for pandarallel team

import urllib

import pandas as pd

from pandarallel import pandarallel


def func(item):
    urllib.request.urlopen("http://www.python.org/")


df = pd.DataFrame({"data": range(20)})
pandarallel.initialize()
df["data"].parallel_apply(func)

SysuJayce avatar Apr 10 '23 06:04 SysuJayce

Hi @SysuJayce,

please try setting the environment variable export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES. Then your example runs fine for me.

See this stackoverflow answer.

till-m avatar Apr 14 '23 14:04 till-m

OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Thanks @till-m , Sorry for missing the info that I run this code in jupyter lab.

Setting env var and the above code works in shell, but in jupyter lab, the problem still exists

SysuJayce avatar Apr 17 '23 15:04 SysuJayce

Hey @SysuJayce, could you try setting it using the env magic %env OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES?

till-m avatar Apr 25 '23 13:04 till-m

%env OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

@till-m Sorry for the late reply. I have tried the env magic, but unfortunately encounter the same error.

image

SysuJayce avatar May 06 '23 12:05 SysuJayce

Pandaral·lel is looking for a maintainer! If you are interested, please open an GitHub issue.

nalepae avatar Jan 23 '24 09:01 nalepae

The code snippet works for me in Jupyter as expected, even when use_memory_fs=False.

Python: 3.10.13 Pandarallel: 1.6.5 Pandas: 2.2.0 Jupyter core: 5.7.1 Jupyterlab: 4.1.0

I'm on Linux though.

shermansiu avatar Apr 27 '24 10:04 shermansiu