requests-html
requests-html copied to clipboard
Select an element and "click" it?
I can't find anything in the documentation about how to run handlers other than (presumably) body's onload. Is there a way to call render with an element and action? That would be perfect.
Any help here?
+1 i'd really love this as a feature!! No selenium anymore to scrape something from a react / angular / vue app.
you can actually write scripts that are included in the rendering part... but... needs javascript knowledge, so some help would be neede
script = """ () => { var item1 = document.getElementsByClassName('kz-icon kz-icon-xs icon--clickable icon--color-dark-snow-200'); for (i = 0; i < 1; ++i) {item1[i].click();}; } """
r.html.render(sleep=5, timeout=10000, script = script, keep_page=True) # this call executes the js in the page site = r.html
TL;DR: Don't think that this functionality is available yet with requests-HTML, as the script argument is used to scrape static HTML only (and not to run js on a dynamic page). Seems like Selenium is the only option (if button click causes existing data on the page to change, instead of opening a new page, and the method for the button click is located in the backend)?
When I try to put js in the script I get a puppeteer error.
For example, I have the following code: script = """ () => { let button = document.getElementsByClassName("course-title")[0] button.click() } """ r.html.render(sleep=5, timeout=10000, script=script, keep_page=True)
and I get the error: pyppeteer.errors.ElementHandleError: Evaluation failed: TypeError: Cannot read property 'click' of undefined at pyppeteer_evaluation_script:4:20
I thought that maybe the page content wasn't rendering properly, so I wrote (this was right above where script was declared) with open('response.txt', 'w', encoding='utf-8') as f: f.write(r.text) to get the content of the HTML, and the element did exist on the page.
Similar error when trying to use jQuery (website has jQuery enabled)
Code:
script = """
() => {
$(document).ready(function() {
$(".course-title")[0].click();
})
}
"""
response = r.html.render(sleep=5, timeout=10000, script=script, keep_page=True)
Error: response is None
So then I wanted to check if js was even working and wrote: script = """ () => { return 1+1; } """ response = r.html.render(sleep=5, timeout=10000, script=script, keep_page=True) print(response)
Output: 2
This means that the role of passing a script to the render function is not to run the script on the page, but rather to run a script to scrape the static HTML content of the page. Otherwise, the response would be an object (requests.Response() object with the attributes .text and methods .json(), etc.). Furthermore, in the case where a button click edits content on the page, and the method for the click is in the backend (I mean this is the only reason you would have to click a button in the first place, otherwise just look at the URL endpoint, or what the method for the click does), not being able to get the HTML response is useless? This matches up with the documentation for requests-HTML where the only time they mention render(script=script) is under the line "You can also render JavaScript pages without Requests:", and in this section, the js is simply run on a HTML string (and this js is just for getting information in the HTML), and the return of the render is just the return of the js code that was in script.
I wasn't able to use the 'load' and 'DOMContentLoaded' events. Did not investigate why yet. I suspect the script is run using the "console", so it cannot get those events. But this is just a wild guess.
For a quick and dirty solution, I was able to use setTimeout:
script = """
() => {
setTimeout(function(){
document.querySelectorAll("a")[2].click();
}, 3000);
}
"""
If I use r.html.render(sleep=10,script=script)
I am able to get the content of the page after the click was executed.
Hope this is useful.