requests-html icon indicating copy to clipboard operation
requests-html copied to clipboard

Issue running AsyncHTMLSession with multiprocessing

Open brandhsu opened this issue 3 years ago • 1 comments

Hi, I am having trouble running multiple AsyncHTMLSession sessions with the multiprocessing module.

Environment

  • Python: 3.10
  • requests-html: 0.10.0 (Author: Kenneth Reitz)

Code

from requests_html import AsyncHTMLSession
from multiprocessing import Pool


def single_session(url):
    asession = AsyncHTMLSession()

    async def get():
        r = await asession.get(url)
        r = await r.html.arender()
        return r

    return asession.run(get)


def multiple_sessions(urls, processes=1):
    with Pool(processes=processes) as p:
        return p.map(single_session, urls)


assert single_session(url="https://python.org/")  # No issues
assert multiple_sessions(urls=["https://python.org/"])  # Big issues

Error

The error when calling multiple_sessions is

"""
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
"""

Getting some type of deadlock behavior but not exactly sure why.

brandhsu avatar Jan 29 '22 22:01 brandhsu

Have you tried not using AsyncHTMLSession and just HTMLSession?

Asynchronous functions run under one process and within one thread. So by using multiprocessing, you are asking a single threaded + single process function to run across multiple threads and processes. To my knowledge there is no way to fully isolate a particular asynchronous session to one process across a multiple process function, because every time you call single_session(url) you are merely creating another session of the same thread and the same process.

https://superfastpython.com/python-concurrency-choose-api/

anoduck avatar Dec 10 '23 09:12 anoduck