requests-html icon indicating copy to clipboard operation
requests-html copied to clipboard

Accepting cookies dialog or removing it

Open ThibTrip opened this issue 4 years ago • 3 comments

Problem description

Hello 🙂!

I am trying to obtain usable screenshots from many webpages (not just siemens like in the example). Sadly there is always some kind of pop up for cookies which I can't remove. I know how to do it with for instance selenium but I would rather not use any other library then yours. The speed with parallel requests is insane and it's easy to use :tada: !

Is there a way to accept the dialog or just remove it? A generic solution would be best obviously but I am also interested in a domain specific one if this is not possible (since everyone uses different cookies banners...).

Thanks in advance :+1:

image

Code Sample

from requests_html import AsyncHTMLSession

asession = AsyncHTMLSession()
url = 'https://www.siemens.de'
r = await asession.get(url)
await r.html.arender(keep_page=True) # "will allow you to interact with the browser page through r.html.page"
screenshot = await r.html.page.screenshot(options={'fullPage':True}) # PNG bytes

# optional (display the page with IPython)  / you can also add 'path':'example.png' to the screenshot options
# to save it on disk
from IPython.display import display_png
display_png(screenshot, raw=True)

ThibTrip avatar Feb 25 '21 19:02 ThibTrip

Somehow I just remembered requests_html can execute JS code so after some research I managed to remove the cookie banner :tada: (I could also click on it but then I have to use some waiting condition that it disappears and it seems more complicated to do). However of course this is not a generic solution :neutral_face:.

Solution (sort of)

Compared to my previous code I added hide_cookies which is a javascript function written as a Python string and then await r.html.page.evaluate(hide_cookies) to execute the js function.

from requests_html import AsyncHTMLSession

asession = AsyncHTMLSession()
url = 'https://www.siemens.de'
# this javascript function will only work for siemens!
hide_cookies = """
function hideCookiesDialog(){
    var cookies= document.getElementById("onetrust-consent-sdk");
    cookies.style.display = "none";
}
"""

r = await asession.get(url)
await r.html.arender(keep_page=True) # "will allow you to interact with the browser page through r.html.page"
await r.html.page.evaluate(hide_cookies) # <-----------------
screenshot = await r.html.page.screenshot(options={'fullPage':True}) # PNG bytes

# optional (display the page with IPython)  / you can also add 'path':'example.png' to the screenshot options
# to save it on disk
from IPython.display import display_png
display_png(screenshot, raw=True)

ThibTrip avatar Feb 25 '21 20:02 ThibTrip

Hello, sorry for spamming (don't mean to bother you and it's actually not an urgent matter) but one of my colleagues just gave me an idea :thinking:. Unless I am mistaken, the browser that requests_html uses is chromium based. Would it then be possible to add chrome extensions to block such dialogs :thinking: ?

ThibTrip avatar Feb 26 '21 09:02 ThibTrip