se-scraper
se-scraper copied to clipboard
Hey everyone.
Please leave your bug reports and requests for features here. I will maintain this package in the future.
Is there possibility to scrape more than first page? Like 20/50/100 results per keyword? What about anti-bot reCaptcha? Maybe try to implement uncaptcha2? https://github.com/ecthros/uncaptcha2
Thank you for making this great scraper, one day I hope I will help with the development :D
Hey, I you like to know if it is possible to scrape more than the first page too. Thanks!
What about support for duckduckgo as search engine?
Is there possibility to scrape more than first page?
Lol why am I creating this issue and then ignoring it? Sorry guys.
Is there possibility to scrape more than first page? Like 20/50/100 results per keyword?
I will implement this for google and bing as a start.
However I will not implement changing the number of SERP results per page to 20,50,100, because in my experience Google blocks the scraping then very quickly. Can anybody confirm?
What about support for duckduckgo as search engine?
I will implement it right now.
Thanks for your comments guys!
Shameless advertisement because I am a broke guy: If you wan to scrape in large quantities, just use my paid service: https://scrapeulous.com/
I would be open to paying for the service if:
- you can scrape more max 100 results per keyword
- if you can filter the proxies per country
- it's cheap and well maintained
uncaptcha2 is seriously nice. Why didn't I not know that it exists?
you can scrape more max 100 results per keyword
Does it matter to you whether there are 10 times 10 results per page or there is one time 100 results per page?
For example one can scrape with such a url https://www.google.de/search?q=scraping&num=100 but in my experience you then tend to get blocked much quicker compared to not specifying the num parameter in the url.
if you can filter the proxies per country
On scrapeulous.com you can select the region to scrape from. All requests come from a similar IP range block.
I don't care how it's organized. I need position 1 to 99. You can do it as you think is more efficient for you.
It would be perfect if you can choose IP per zone. The SERPs change drastically depending on the IP. I'm building a heavy dependent code on this results and I need a Good SERP Scraper. Let me know if you are open to chat about it.
I am not developing customized scrapers right now. I am developing this scraper because I use it internally and on my scraping service.
Multiple pages are supported now in se-scraper, there is an option
num_pages: 3,
that you can pass to the scraper.
On scrapeulous.com, it will take some days until I support multiple pages.
how to solve the follow problem: TimeoutError: waiting for selector "#center_col" failed: timeout 8000ms exceeded
how to solve the follow problem: TimeoutError: waiting for selector "#center_col" failed: timeout 8000ms exceeded
Do you have network issues? Because I consider 8 seconds to wait for the first google html long enough.
But I will make it configurable in the next release.
Can we put queries instead of keywords ? for example: intitle:scraper
Can we put queries instead of keywords ? for example: intitle:scraper
Yes you can put whatever query you want. The whole thing is searched by the search engines.
Hi, is it possible to make a full-page screenshot of SERP? If I add --screenshot to ADDITIONAL_CHROME_FLAGS I obtain this:
Error: Failed to launch chrome!
[0319/115015.357805:ERROR:headless_shell.cc(583)] Capture screenshot is disabled when remote debugging is enabled.
Is there a way to disable remote debugging? Can't find the flag causing this problem.
Hi there, does this repo still work?