requests-html
requests-html copied to clipboard
Accepting cookies dialog or removing it
Problem description
Hello 🙂!
I am trying to obtain usable screenshots from many webpages (not just siemens like in the example). Sadly there is always some kind of pop up for cookies which I can't remove. I know how to do it with for instance selenium but I would rather not use any other library then yours. The speed with parallel requests is insane and it's easy to use :tada: !
Is there a way to accept the dialog or just remove it? A generic solution would be best obviously but I am also interested in a domain specific one if this is not possible (since everyone uses different cookies banners...).
Thanks in advance :+1:

Code Sample
from requests_html import AsyncHTMLSession
asession = AsyncHTMLSession()
url = 'https://www.siemens.de'
r = await asession.get(url)
await r.html.arender(keep_page=True) # "will allow you to interact with the browser page through r.html.page"
screenshot = await r.html.page.screenshot(options={'fullPage':True}) # PNG bytes
# optional (display the page with IPython) / you can also add 'path':'example.png' to the screenshot options
# to save it on disk
from IPython.display import display_png
display_png(screenshot, raw=True)
Somehow I just remembered requests_html can execute JS code so after some research I managed to remove the cookie banner :tada: (I could also click on it but then I have to use some waiting condition that it disappears and it seems more complicated to do). However of course this is not a generic solution :neutral_face:.
Solution (sort of)
Compared to my previous code I added hide_cookies which is a javascript function written as a Python string and then await r.html.page.evaluate(hide_cookies) to execute the js function.
from requests_html import AsyncHTMLSession
asession = AsyncHTMLSession()
url = 'https://www.siemens.de'
# this javascript function will only work for siemens!
hide_cookies = """
function hideCookiesDialog(){
var cookies= document.getElementById("onetrust-consent-sdk");
cookies.style.display = "none";
}
"""
r = await asession.get(url)
await r.html.arender(keep_page=True) # "will allow you to interact with the browser page through r.html.page"
await r.html.page.evaluate(hide_cookies) # <-----------------
screenshot = await r.html.page.screenshot(options={'fullPage':True}) # PNG bytes
# optional (display the page with IPython) / you can also add 'path':'example.png' to the screenshot options
# to save it on disk
from IPython.display import display_png
display_png(screenshot, raw=True)
Hello, sorry for spamming (don't mean to bother you and it's actually not an urgent matter) but one of my colleagues just gave me an idea :thinking:. Unless I am mistaken, the browser that requests_html uses is chromium based. Would it then be possible to add chrome extensions to block such dialogs :thinking: ?