nodriver icon indicating copy to clipboard operation
nodriver copied to clipboard

Problem running nodriver in headless mode

Open KenyOnFire opened this issue 1 year ago • 5 comments

I was doing tests with the nodriver module, when I tried to test the headless mode and I discovered that when activating this mode, the user-agent is modified and this makes the browser detectable as a bot, I attach the user-agent that is returned to me when using headless. Thank you!

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/128.0.0.0 Safari/537.36

TEMPORALY FIX: Inside the nodriver module there is a class called Config, on line 185 after if self.headless: args.append("--headless=new") I have included a request with the requests module to obtain the latest useragent for chrome without that supposed 'Headless' and thanks to this before the execution the 'Headless' text disappears, I leave the code here in case it helps someone so_key = {"windows": "windows", "linux": "linux", "darwin": "mac"}[platform.system().lower()] ua = next(ua for ua in requests.get("https://jnrbsn.github.io/user-agents/user-agents.json").json() if so_key in ua.lower() and "chrome" in ua.lower() and "firefox" not in ua.lower()) args.append('--user-agent=' + ua)

KenyOnFire avatar Sep 03 '24 00:09 KenyOnFire

The irony in a library designed to ensure Chrome's stealth as a web scraper, yet inadvertently revealing itself by failing to suppress the very "HeadlessChrome" signature it was supposed to conceal in headless mode.

ioio101 avatar Sep 03 '24 12:09 ioio101

requests.get("https://jnrbsn.github.io/user-agents/user-agents.json").json()

Hello. That is unnecessary. you can manually replace it with useragent_override and replace() method.

devblack avatar Sep 04 '24 01:09 devblack

requests.get("https://jnrbsn.github.io/user-agents/user-agents.json").json(.json())

Hello. That is unnecessary. you can manually replace it with useragent_override and replace() method.

I know that it is not necessary or practical in the long run, but I couldn't apply your logic, could you be more specific about using the useragent_override method since I can't find any documentation about that, besides the idea is that before initializing the browser , carry the useragent without the word Headless like undetected chromedriver does. If you could give me an example code in which you perform this fix, that would be great and I could conclude the thread.

PD: I have also tried this code but it only injects the cdp of the current tab, and not the entire browser async def change_useragent(self, useragent): self.page.feed_cdp(cdp.emulation.set_user_agent_override( useragent )) return await self.page.reload()

KenyOnFire avatar Sep 04 '24 03:09 KenyOnFire

The irony in a library designed to ensure Chrome's stealth as a web scraper, yet inadvertently revealing itself by failing to suppress the very "HeadlessChrome" signature it was supposed to conceal in headless mode.

Just run a javascript that does it or start chrome with the custom agent from the commands and stop crying.

boludoz avatar Sep 04 '24 05:09 boludoz

Study the documentation on user agents

Toxenskiy avatar Sep 04 '24 09:09 Toxenskiy

Well, i don't know what your problem is


from nodriver import *

browser = await start(headless=True)

tab = await browser.get('https://deviceandbrowserinfo.com/are_you_a_bot')

await tab.save_screenshot(full_page=True)

!deviceandbrowserinfo.com__are_you_a_bot_2025-03-31_10-40-22.jpg

Image

ultrafunkamsterdam avatar Mar 31 '25 08:03 ultrafunkamsterdam