requests-html icon indicating copy to clipboard operation
requests-html copied to clipboard

When I call ''r.html.render()'', it rise erro'Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.'

Open funiazi opened this issue 5 years ago • 26 comments

I I wrote code like this:

from requests_html import HTMLSession session = HTMLSession() r = session.get(url)

Then i wrote the following:

r.html.render() it raise

RuntimeError: Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.

When I change my code like:

session = AsyncHTMLSession() r = session.get(url) r.html.render()

It tells me ---''AttributeError: '_asyncio.Future' object has no attribute 'html'''

How to slove this problem? If I use AsyncHTMLSession, how to load javascript?

These code run on Spyder: window 10 python: 3.7

funiazi avatar May 06 '19 08:05 funiazi

Hi brother! I also seem to have a problem that is more troublesome than the poster. This is my code: async def main(self, **kwargs): r = session1.get('http://bbs.tianya.cn/post-free-6085404-1.shtml') r.html.render() print(r.html.html) Raised: RuntimeError: Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead. I know this is because the error caused by the call in ‘async’, but I can't find a good way, can you help me? Grateful! (I don't know English, this is the result of Google Translate)

falwiw avatar Jul 21 '19 10:07 falwiw

@funiazi Spyder already has it's own event loop running (as do Jupyter Notebook and JupyterLab etc) Adding the following before your code should let you run it in Spyder (and Jupyter etc)

import asyncio
if asyncio.get_event_loop().is_running(): # Only patch if needed (i.e. running in Notebook, Spyder, etc)
    import nest_asyncio
    nest_asyncio.apply()

It works for me.

You will probably need to install the nest_asyncio package.

ustauss avatar Jul 31 '19 06:07 ustauss

@ustauss asyncio did not work for me unfortunately.

Schmidtbit avatar Jan 12 '20 20:01 Schmidtbit

Did anyone figure this out?

Schmidtbit avatar Jan 12 '20 20:01 Schmidtbit

@funiazi This should work


asession = AsyncHTMLSession()
r = await asession.get(API)
await r.html.arender()
resp=r.html.raw_html

html.raw_html seems to give the desired output

RichardPears avatar Feb 13 '20 10:02 RichardPears

Did anyone figure this out

sanwal-clarifai avatar Jun 17 '20 20:06 sanwal-clarifai

did anyone figure this out abeg

Fawaz441 avatar Jul 03 '20 21:07 Fawaz441

Had the same problem, try this: pip3 install -U requests[security] Works for me.

akivanno avatar Jul 06 '20 10:07 akivanno

Anyone fixed this issue?

fayedraza avatar Jul 28 '20 08:07 fayedraza

Anyone fixed this issue?

@Grasshopper04 's response worked for retrieving html.

kylebeloin avatar Aug 13 '20 00:08 kylebeloin

I got it working with the following steps

  • Stop using jupyter notebook and run it as a python file
  • Don't use async code

YuTamura29 avatar Sep 27 '20 11:09 YuTamura29

@YuTamura29 You are right, I run the same code in vs-code, and it give desired output :1st_place_medal:

0xdeepmehta avatar Sep 27 '20 11:09 0xdeepmehta

@YuTamura29 what did you use? I tried using PyCharm with this code: from requests_html import HTMLSession session = HTMLSession() r = session.get('http://python-requests.org/') r.html.render()

However, I got this error: urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /chromium-browser-snapshots/Win_x64/588429/chrome-win32.zip (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))

ShayHa avatar Oct 07 '20 15:10 ShayHa

I am using Python 3.8 and jupyter-notebook 6.1.3. With other libraries using asyncio it is possible simply to await coroutines directly from the notebook console, and for me, this works with requests_html up to a point.

The following code works as expected:

import asyncio
loop = asyncio.get_running_loop()
asession = AsyncHTMLSession(loop=loop)
r = await asession.get('https://www.google.com')
script = '''
       () => {return 'Hello World'}
'''
await r.html.arender(script=script, reload=False)

The output is Hello World.

However, if I call r.html.arender(script=script, reload=False) a second time, I get the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-66-620e73c0219e> in <module>
----> 1 await r.html.arender(script=script, reload=False)

/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/requests_html.py in arender(self, retries, script, wait, scrolldown, sleep, reload, timeout, keep_page)
    613         """ Async version of render. Takes same parameters. """
    614 
--> 615         self.browser = await self.session.browser
    616         content = None
    617 

/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/requests_html.py in browser(self)
    727             self.loop = asyncio.get_event_loop()
    728             if self.loop.is_running():
--> 729                 raise RuntimeError("Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.")
    730             self._browser = self.loop.run_until_complete(super().browser)
    731         return self._browser

RuntimeError: Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.

The stack trace suggests that the session object has for some reason reverted to an instance of HTMLSession. And indeed, before the first call to r.html.arender, which succeeds, r.html.session appears to be an instance of AsyncHTMLSession. But after the (successful) call to arender, r.html.session is HTMLSession.

Other async coroutines in Jupyter work fine for me; it's only arender that is causing an error.

Perhaps this sheds some light on the foregoing comments? (Incidentally, I believe support for running asnychronous code in the iPython repl -- and by extension, in Jupyter notebooks -- was added with v. 7 for Python 3.6+.)

Update: I notice that in requests_html.py, line 703 reads as follows:

        html = HTML(url=self.url, html=content.encode(DEFAULT_ENCODING), default_encoding=DEFAULT_ENCODING)

This line appears within the HTML.arender coroutine definition. Here a new instance of the HTML class is created, but neither is the current session passed to the constructor, nor is the _async Boolean flag set to True.

If I am reading the code correctly, the result is that arender initializes an HTML object with an HTMLSession instance. Since this is the async render method, it seems as though it should use the AsyncHTMLSession instead.

dolsysmith avatar Oct 28 '20 16:10 dolsysmith

Run it on Pycharm as a single .py file. It will run.

mayankrichu avatar Dec 14 '20 21:12 mayankrichu

I got it working with the following steps

  • Stop using jupyter notebook and run it as a python file
  • Don't use async code

My problem is that I am fetching this website session inside a function that is being called by an asynchronous event callback function inside my Discord bot Client, so even running this as a python file cannot fix my issue, sadly.

maxrdz avatar May 08 '21 21:05 maxrdz

Has anyone figured out how to get this working in jupyter yet?

theloni-monk avatar Jul 01 '21 22:07 theloni-monk

anything new about this issue?

yaron-shamul avatar Dec 14 '21 10:12 yaron-shamul

I would like to post my experience (sorry in advance for the long post): my code contains: ` session = AsyncHTMLSession(); Source = "https://www.flashscore.it/tennis/";

print("\n Estrazione da ", Source, "\n \n"); response = await session.get(Source) # Get the html content await response.html.arender(timeout=6000, sleep=3) #timeout=15,60 print(response.html.raw_html) soup1 = bs4.BeautifulSoup(response.html.raw_html, 'html.parser') `

just to scrape a couple of tennis stats. If I run it on Spyder it takes forever to run arender (hangs up, I believe) and it doesn't mind at all if insert/change the values "timeout" or "sleep". The same does if I run it on python as a script ( I can see the print "Estrazione" but nothing more). On Jupyter notebook it seems to have the ikernel working (the icon is flashing) on it, but doesn't do any different. Please let me know if I can add more informations. I am running it with asyncio.

ElMastro avatar Dec 23 '21 13:12 ElMastro

I had the same issue while rendering, you have to make sure youre using await in front of render and you need to make sure youre using the arender() not render() function...

Thats what worked for me, but it won't work for you all because some of you are already using this function..

forest-cat avatar Jan 05 '22 03:01 forest-cat

Unfortunately is not my case. In the code there was await response.html.arender(timeout=6000, sleep=3)

It was working well, until about a month ago when coming to this command it would keep running it like forever. Now I am using slenium instead because I couldn't make it work so far.

ElMastro avatar Jan 05 '22 15:01 ElMastro

For me it is taking some seconds to load too, but its loading. When its not loading idk what to do but its good you found a way to make it with selenium

forest-cat avatar Jan 05 '22 20:01 forest-cat

Unfortunately this is a workaround. The only thing not working in requests-html is arender (for me), and it is a pity to switch library just for this, I could try to render the page outside, but it would not be the same (a lot more complex)


Da: forest-cat @.> Inviato: mercoledì 5 gennaio 2022 21:39 A: psf/requests-html @.> Cc: ElMastro @.>; Comment @.> Oggetto: Re: [psf/requests-html] When I call ''r.html.render()'', it rise erro'Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.' (#294)

For me it is taking some seconds to load too, but its loading. When its not loading idk what to do but its good you found a way to make it with selenium

— Reply to this email directly, view it on GitHubhttps://github.com/psf/requests-html/issues/294#issuecomment-1006060564, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKYXXCV66O77HG7D5SQ4B5LUUST7VANCNFSM4HK56ADA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.Message ID: @.***>

ElMastro avatar Jan 06 '22 21:01 ElMastro

I believe I need to add something: on httpx-html, the "arender" command shows the same behavior. Considering that all the other things work, probably in my case the problem is with javascript (on my computer)

ElMastro avatar Jan 13 '22 20:01 ElMastro

@funiazi This should work


asession = AsyncHTMLSession()
r = await asession.get(API)
await r.html.arender()
resp=r.html.raw_html

html.raw_html seems to give the desired output

@RichardPears solution works for me.

At first it didn't because while trying to debug I had fractioned each line into a Jupyter cell. It didn't work obviously in my loop when running cell by cell. Just wanted to share my dumb issue incase somebody else is doing the same.

L-Gagliardi avatar Jan 19 '22 09:01 L-Gagliardi

For rendering a local file:

from pathlib import Path
from requests_html import HTML, AsyncHTMLSession

session = AsyncHTMLSession()
html = HTML(html=Path('file.html').read_text(), session=session)
await html.arender()
print(html.html)

edublancas avatar Apr 20 '22 21:04 edublancas