requests-html icon indicating copy to clipboard operation
requests-html copied to clipboard

OSError: [Errno 24] Too many open files

Open jjasonbell opened this issue 5 years ago • 7 comments

Using macOS Mojave 10.14.4, requests-html 0.10.0, python 3.6. I'm running the following inside a loop over a number of files:

session = HTMLSession()
r = session.get(url)
r.html.render(retries=8, wait=2, sleep=2)
date = r.html.search('Published on {date}"')['date']
session.close()

Traceback:

File "date_scraper.py", line 26, in get_date
  r.html.render(retries=8, wait=2, sleep=sleep)
File "/anaconda3/lib/python3.6/site-packages/requests_html.py", line 586, in render
  self.browser = self.session.browser  # Automatically create a event loop and browser
File "/anaconda3/lib/python3.6/site-packages/requests_html.py", line 730, in browser
  self._browser = self.loop.run_until_complete(super().browser)
File "/anaconda3/lib/python3.6/asyncio/base_events.py", line 468, in run_until_complete
  return future.result()
File "/anaconda3/lib/python3.6/site-packages/requests_html.py", line 714, in browser
  self._browser = await pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True, 
args=self.__browser_args)
File "/anaconda3/lib/python3.6/site-packages/pyppeteer/launcher.py", line 311, in launch
  return await Launcher(options, **kwargs).launch()
File "/anaconda3/lib/python3.6/site-packages/pyppeteer/launcher.py", line 169, in launch
  **options,
File "/anaconda3/lib/python3.6/subprocess.py", line 709, in __init__
  restore_signals, start_new_session)
File "/anaconda3/lib/python3.6/subprocess.py", line 1234, in _execute_child
  errpipe_read, errpipe_write = os.pipe()
OSError: [Errno 24] Too many open files

I found a similar issue associated with requests but haven't found a solution there yet.

jjasonbell avatar Jun 12 '19 13:06 jjasonbell

I've encountered the same issue as well both on mac high sierra and ubuntu linux. My code was also wrapped in a loop, and it would run fine initially, but at some point in the loop, it would start outputting [Errno 24] Too many open files. Is it possible that calling session.close() might not be closing the browser instance properly sometimes?

pmdbt avatar Jul 17 '19 04:07 pmdbt

Hey, have you found a solution for this? I have the same problem right now...

alainmore avatar Nov 01 '20 01:11 alainmore

@alainmore Sadly I don't have a concrete fix, because I don't know the root cause of the problem. But, I can say that the issue disappeared for me when I wrapped the requests-html portion of the code in a "with" statement. I dockerized the entire script and deployed it on AWS and the problem went away. I don't know if it's a combination of all those or just one specific action that fixed the issue for me.

pmdbt avatar Nov 02 '20 01:11 pmdbt

My situation was time sensitive and after trying several obvious fixes I went back to BeautifulSoup. @pmdbt’s idea seems good, I can’t remember if I tried it.

jjasonbell avatar Nov 02 '20 07:11 jjasonbell

@alainmore I wrapped the requests-html portion of the code in a "with" statement.

Can you please give the code of how you did that?

It seems that each call to request-html leaves one file (pipe) open, even when you call session.close() each time, and after about 240 calls the OS quits with this error (MacOS High Sierra in my case).

varalgit avatar Aug 17 '21 19:08 varalgit

@varalgit not sure if you discovered this on your own, but your question was helpfully answered on SO. Hope that helps!

jhaber-zz avatar Jul 25 '22 17:07 jhaber-zz