pandarallel icon indicating copy to clipboard operation
pandarallel copied to clipboard

Using `requests` inside the mapped function causes troubles.

Open zenodallavalle opened this issue 2 years ago • 1 comments

General

  • Mac OS X Ventura 13.1:
  • Python 3.10.9:
  • Pandas 1.5.2:
  • Pandarallel 1.6.4:

Acknowledgement

  • My issue is NOT present when using pandas without alone (without pandarallel)

Bug description

Using requests inside the mapped function causes troubles.

The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.

Observed behavior

Function get stuck.

Expected behavior

Function runs fine.

Minimal but working code sample to ease bug fix for pandarallel team

create a separate file to hold the test functions called test_fns.py

class Downloader:
    def __init__(self, url) -> None:
        self.url = url

    def download(self) -> str:
        import requests

        r = requests.get(self.url)
        assert r.status_code == 200
        return r.text[:10]


def docs_example(x) -> float:
    import math
    
    return math.sin(x**2) + math.sin(x**2)


def request(x) -> str:
    return Downloader('https://www.google.com').download()

Create another file that contains the actual script

import test_fns
import pandas as pd

from pandarallel import pandarallel

pandarallel.initialize(progress_bar=True)

if __name__ == '__main__':
    source = pd.Series(range(10))
    source.parallel_map(test_fns.docs_example)
    source.parallel_map(test_fns.request)

Other considerations:

Starting the processes in spawn mode solves the problem. Forcing spawn mode with

import pandarallel
pandarallel.core.CONTEXT = pandarallel.core.multiprocessing.get_context('spawn')
pandarallel.pandarallel.initialize(progress_bar=True)

solves the problem.

Another strange aspect is that if you call the function without creating a new process test_fns.request('') before applying parallel_map or simply make a request before applying parallel_map (for instance requests.get('http://github.com') everything runs fine.

zenodallavalle avatar Feb 12 '23 15:02 zenodallavalle

Pandaral·lel is looking for a maintainer! If you are interested, please open an GitHub issue.

nalepae avatar Jan 23 '24 09:01 nalepae