pandarallel
pandarallel copied to clipboard
Using `requests` inside the mapped function causes troubles.
General
- Mac OS X Ventura 13.1:
- Python 3.10.9:
- Pandas 1.5.2:
- Pandarallel 1.6.4:
Acknowledgement
- My issue is NOT present when using
pandas
without alone (withoutpandarallel
)
Bug description
Using requests
inside the mapped function causes troubles.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
Observed behavior
Function get stuck.
Expected behavior
Function runs fine.
Minimal but working code sample to ease bug fix for pandarallel
team
create a separate file to hold the test functions called test_fns.py
class Downloader:
def __init__(self, url) -> None:
self.url = url
def download(self) -> str:
import requests
r = requests.get(self.url)
assert r.status_code == 200
return r.text[:10]
def docs_example(x) -> float:
import math
return math.sin(x**2) + math.sin(x**2)
def request(x) -> str:
return Downloader('https://www.google.com').download()
Create another file that contains the actual script
import test_fns
import pandas as pd
from pandarallel import pandarallel
pandarallel.initialize(progress_bar=True)
if __name__ == '__main__':
source = pd.Series(range(10))
source.parallel_map(test_fns.docs_example)
source.parallel_map(test_fns.request)
Other considerations:
Starting the processes in spawn mode solves the problem. Forcing spawn mode with
import pandarallel
pandarallel.core.CONTEXT = pandarallel.core.multiprocessing.get_context('spawn')
pandarallel.pandarallel.initialize(progress_bar=True)
solves the problem.
Another strange aspect is that if you call the function without creating a new process test_fns.request('')
before applying parallel_map or simply make a request before applying parallel_map (for instance requests.get('http://github.com')
everything runs fine.
Pandaral·lel is looking for a maintainer! If you are interested, please open an GitHub issue.