crawl4ai
crawl4ai copied to clipboard
[Bug]: Unable to retrieve google search result information when requesting from Docker
crawl4ai version
6
Expected Behavior
Requesting a google search url, example: https://www.google.com/search?q=hotel, I want to be able to retrieve the different domains present on the page and the rank of each one.
Current Behavior
Very recently, my crawler started having trouble returning information from Google search result URLs, for example: https://www.google.com/search?q=hotel.
Google seems to have implemented a more sophisticated anti-bot detection system.
I've tried all possible options: changing headers (user-agent, accept, language, referer, host), using proxies, adding magic = true, and attempting to use the persistence context. However, none of these solutions worked, and Google keeps displaying the captcha page stating that unusual traffic has been detected.
An interesting aspect is that when I run the crawler outside of the Docker environment, it operates normally.
Is this reproducible?
Yes
Inputs Causing the Bug
- Urls: https://www.google.com/search?q=hotel, but could be any google search result urls /search?q=...
Steps to Reproduce
Code snippets
OS
Linux
Python version
3.12.6
Browser
Chrome
Browser version
134
Error logs & Screenshots (if applicable)
No response