requests-html
requests-html copied to clipboard
When I call ''r.html.render()'', it rise erro'Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.'
I I wrote code like this:
from requests_html import HTMLSession session = HTMLSession() r = session.get(url)
Then i wrote the following:
r.html.render() it raise
RuntimeError: Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.
When I change my code like:
session = AsyncHTMLSession() r = session.get(url) r.html.render()
It tells me ---''AttributeError: '_asyncio.Future' object has no attribute 'html'''
How to slove this problem? If I use AsyncHTMLSession, how to load javascript?
These code run on Spyder: window 10 python: 3.7
Hi brother! I also seem to have a problem that is more troublesome than the poster. This is my code: async def main(self, **kwargs): r = session1.get('http://bbs.tianya.cn/post-free-6085404-1.shtml') r.html.render() print(r.html.html) Raised: RuntimeError: Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead. I know this is because the error caused by the call in ‘async’, but I can't find a good way, can you help me? Grateful! (I don't know English, this is the result of Google Translate)
@funiazi Spyder already has it's own event loop running (as do Jupyter Notebook and JupyterLab etc) Adding the following before your code should let you run it in Spyder (and Jupyter etc)
import asyncio
if asyncio.get_event_loop().is_running(): # Only patch if needed (i.e. running in Notebook, Spyder, etc)
import nest_asyncio
nest_asyncio.apply()
It works for me.
You will probably need to install the nest_asyncio
package.
@ustauss asyncio did not work for me unfortunately.
Did anyone figure this out?
@funiazi This should work
asession = AsyncHTMLSession()
r = await asession.get(API)
await r.html.arender()
resp=r.html.raw_html
html.raw_html seems to give the desired output
Did anyone figure this out
did anyone figure this out abeg
Had the same problem, try this:
pip3 install -U requests[security]
Works for me.
Anyone fixed this issue?
Anyone fixed this issue?
@Grasshopper04 's response worked for retrieving html.
I got it working with the following steps
- Stop using jupyter notebook and run it as a python file
- Don't use async code
@YuTamura29 You are right, I run the same code in vs-code, and it give desired output :1st_place_medal:
@YuTamura29 what did you use? I tried using PyCharm with this code: from requests_html import HTMLSession session = HTMLSession() r = session.get('http://python-requests.org/') r.html.render()
However, I got this error: urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /chromium-browser-snapshots/Win_x64/588429/chrome-win32.zip (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))
I am using Python 3.8 and jupyter-notebook 6.1.3. With other libraries using asyncio
it is possible simply to await
coroutines directly from the notebook console, and for me, this works with requests_html up to a point.
The following code works as expected:
import asyncio
loop = asyncio.get_running_loop()
asession = AsyncHTMLSession(loop=loop)
r = await asession.get('https://www.google.com')
script = '''
() => {return 'Hello World'}
'''
await r.html.arender(script=script, reload=False)
The output is Hello World
.
However, if I call r.html.arender(script=script, reload=False)
a second time, I get the following error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-66-620e73c0219e> in <module>
----> 1 await r.html.arender(script=script, reload=False)
/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/requests_html.py in arender(self, retries, script, wait, scrolldown, sleep, reload, timeout, keep_page)
613 """ Async version of render. Takes same parameters. """
614
--> 615 self.browser = await self.session.browser
616 content = None
617
/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/requests_html.py in browser(self)
727 self.loop = asyncio.get_event_loop()
728 if self.loop.is_running():
--> 729 raise RuntimeError("Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.")
730 self._browser = self.loop.run_until_complete(super().browser)
731 return self._browser
RuntimeError: Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.
The stack trace suggests that the session
object has for some reason reverted to an instance of HTMLSession
. And indeed, before the first call to r.html.arender
, which succeeds, r.html.session
appears to be an instance of AsyncHTMLSession
. But after the (successful) call to arender
, r.html.session
is HTMLSession
.
Other async coroutines in Jupyter work fine for me; it's only arender
that is causing an error.
Perhaps this sheds some light on the foregoing comments? (Incidentally, I believe support for running asnychronous code in the iPython repl -- and by extension, in Jupyter notebooks -- was added with v. 7 for Python 3.6+.)
Update: I notice that in requests_html.py
, line 703 reads as follows:
html = HTML(url=self.url, html=content.encode(DEFAULT_ENCODING), default_encoding=DEFAULT_ENCODING)
This line appears within the HTML.arender
coroutine definition. Here a new instance of the HTML
class is created, but neither is the current session passed to the constructor, nor is the _async
Boolean flag set to True
.
If I am reading the code correctly, the result is that arender
initializes an HTML
object with an HTMLSession
instance. Since this is the async render method, it seems as though it should use the AsyncHTMLSession
instead.
Run it on Pycharm as a single .py file. It will run.
I got it working with the following steps
- Stop using jupyter notebook and run it as a python file
- Don't use async code
My problem is that I am fetching this website session inside a function that is being called by an asynchronous event callback function inside my Discord bot Client, so even running this as a python file cannot fix my issue, sadly.
Has anyone figured out how to get this working in jupyter yet?
anything new about this issue?
I would like to post my experience (sorry in advance for the long post): my code contains: ` session = AsyncHTMLSession(); Source = "https://www.flashscore.it/tennis/";
print("\n Estrazione da ", Source, "\n \n"); response = await session.get(Source) # Get the html content await response.html.arender(timeout=6000, sleep=3) #timeout=15,60 print(response.html.raw_html) soup1 = bs4.BeautifulSoup(response.html.raw_html, 'html.parser') `
just to scrape a couple of tennis stats. If I run it on Spyder it takes forever to run arender (hangs up, I believe) and it doesn't mind at all if insert/change the values "timeout" or "sleep". The same does if I run it on python as a script ( I can see the print "Estrazione" but nothing more). On Jupyter notebook it seems to have the ikernel working (the icon is flashing) on it, but doesn't do any different. Please let me know if I can add more informations. I am running it with asyncio.
I had the same issue while rendering, you have to make sure youre using await in front of render and you need to make sure youre using the arender() not render() function...
Thats what worked for me, but it won't work for you all because some of you are already using this function..
Unfortunately is not my case. In the code there was
await response.html.arender(timeout=6000, sleep=3)
It was working well, until about a month ago when coming to this command it would keep running it like forever. Now I am using slenium instead because I couldn't make it work so far.
For me it is taking some seconds to load too, but its loading. When its not loading idk what to do but its good you found a way to make it with selenium
Unfortunately this is a workaround. The only thing not working in requests-html is arender (for me), and it is a pity to switch library just for this, I could try to render the page outside, but it would not be the same (a lot more complex)
Da: forest-cat @.> Inviato: mercoledì 5 gennaio 2022 21:39 A: psf/requests-html @.> Cc: ElMastro @.>; Comment @.> Oggetto: Re: [psf/requests-html] When I call ''r.html.render()'', it rise erro'Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.' (#294)
For me it is taking some seconds to load too, but its loading. When its not loading idk what to do but its good you found a way to make it with selenium
— Reply to this email directly, view it on GitHubhttps://github.com/psf/requests-html/issues/294#issuecomment-1006060564, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKYXXCV66O77HG7D5SQ4B5LUUST7VANCNFSM4HK56ADA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.Message ID: @.***>
I believe I need to add something: on httpx-html, the "arender" command shows the same behavior. Considering that all the other things work, probably in my case the problem is with javascript (on my computer)
@funiazi This should work
asession = AsyncHTMLSession() r = await asession.get(API) await r.html.arender() resp=r.html.raw_html
html.raw_html seems to give the desired output
@RichardPears solution works for me.
At first it didn't because while trying to debug I had fractioned each line into a Jupyter cell. It didn't work obviously in my loop when running cell by cell. Just wanted to share my dumb issue incase somebody else is doing the same.
For rendering a local file:
from pathlib import Path
from requests_html import HTML, AsyncHTMLSession
session = AsyncHTMLSession()
html = HTML(html=Path('file.html').read_text(), session=session)
await html.arender()
print(html.html)