requests-html
requests-html copied to clipboard
Can't render JavaScript in requests-html / Can't run multithreading in Pyppeteer
Hi,
I'm trying to render JavaScript from webpages, but requests-html fails every time to do it.
This is my code:
from requests_html import HTMLSession
s = HTMLSession()
r = s.get('https://httpbin.org')
r.html.render()
print(r.html.html)
Some important points to make: -Searching with CTRL+F in the output for the right version that's displayed when rendering the JavaScript; version 0.9.2 is for non-javascript, while 0.9.3 is for javascript - it always shows 0.9.2 -Searching the keyword "cookie" (it displays "0 matches" even when typing only "cook") doesn't show anything because that keyword is displayed when rendering the JavaScript
It prints out the only HTML code before executing the JavaScript. I've tried to put a bigger timeout to render:
r.html.render(timeout=60)
But it still waits the default 8 seconds.
When trying to put:
r.html.render(sleep=60)
It waits for those 60 seconds and then it doesn't do anything; more than that, it says that the connection's been lost.
I thought that maybe it didn't render the JavaScript because it didn't have any type of headers so I've added the Chrome's ones (I've tried with user-agent only & then with all headers displayed in the network tab from Chrome when accessing httpbin.org), but still with no success.
I've tried to render the JavaScript with Pyppeteer which is included in the requests-html library and it can render the JavaScript (I don't understand why since it's included in the requests-html library); the only downside of this is that I've to scrape lots of links, but I couldn't find a way to run multiple instances of Pyppeteer.
By the way, I'm using PyCharm on Windows 10 with Python 3.6.1 (3.6 throws an error regarding a 'Deque' thing that can't be imported) / 3.7; maybe this info helps in solving the issue.
I've tried to be as detailed as possible with the problems I'm facing right now and I hope I can get the solutions I'm looking for.
Thanks in advance!
P.S. Chromium is downloaded and it shows in task manager when running the render() function (same happens when running the Pyppeteer code).
I've had the same issue and have been searching for a solution for quite some time.
I understand your situation too because I've searched for a solution for a few weeks and I don't know how much time it will take until we'll get a proper answer on this issue.
So I'm guessing that this project is abandoned.
yeah.... same. I found a better solution. I switched over to Splash Lua Docker HTTP API and couldn't be more pleased with the results.
同样的问题,即使把asyncio.get_event_loop()改成asyncio.new_event_loop(),也有问题, 提示:signal only works in main thread,多线程里没法用
+1, don't know if exists bug, or project is unmaintained......
Here is my workaround
Here is my workaround
pyppeteer is little heavy on resource and slow, is there any other library like aiohttp or requests which can render a javascript page and has the async support, Because requests_html is not working at all and running pyppeteer with async is heavy on system resource and also takes quit long amount of time, I passed 10 urls with async and it took more than a minute to render a javascript website and give the result.
was there any solution?
how to set timeout for render javascript theese my code
def get_response(self, url):
session = HTMLSession()
res = session.request(method='get', url=url, headers=self.headers, timeout=5)
try:
print('creating directory to append temporary file')
os.makedirs('redfin_com_temporary')
except FileExistsError:
print('directory created')
# create response temporary file
f = open('redfin_com_temporary/res.html', 'w+')
f.write(res.text)
f.close()
# status code
print(f'Site Status Code: {res.status_code}')
return res.html.render()
i got error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 8000 ms exceeded.
everybody can help me?
My solution:
1.find function browser( ) in requests_html.py
//$python\Lib\site-packages\requests_html.py
async def browser(self):
if not hasattr(self, "_browser"):
self._browser = await pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True, args=self.__browser_args)
return self._browser
2.replace headless value
headless=False
3.then, when render() function work, it will open Chromium to render successfully
My solution:
1.find function browser( ) in requests_html.py
//$python\Lib\site-packages\requests_html.py async def browser(self): if not hasattr(self, "_browser"): self._browser = await pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True, args=self.__browser_args) return self._browser
2.replace headless value
headless=False
3.then, when render() function work, it will open Chromium to render successfully
This worked for me after countless other things didn't. Thanks!
in _cleanup_tmp_user_data_dir raise IOError('Unable to remove Temporary User Data')