se-scraper icon indicating copy to clipboard operation
se-scraper copied to clipboard

Hey everyone.

Open NikolaiT opened this issue 6 years ago • 15 comments
trafficstars

Please leave your bug reports and requests for features here. I will maintain this package in the future.

NikolaiT avatar Dec 29 '18 17:12 NikolaiT

Is there possibility to scrape more than first page? Like 20/50/100 results per keyword? What about anti-bot reCaptcha? Maybe try to implement uncaptcha2? https://github.com/ecthros/uncaptcha2

Thank you for making this great scraper, one day I hope I will help with the development :D

shadaxv avatar Jan 06 '19 20:01 shadaxv

Hey, I you like to know if it is possible to scrape more than the first page too. Thanks!

alvin-leong avatar Jan 18 '19 01:01 alvin-leong

What about support for duckduckgo as search engine?

hubitor avatar Jan 23 '19 16:01 hubitor

Is there possibility to scrape more than first page?

danipolo avatar Jan 29 '19 02:01 danipolo

Lol why am I creating this issue and then ignoring it? Sorry guys.

Is there possibility to scrape more than first page? Like 20/50/100 results per keyword?

I will implement this for google and bing as a start.

However I will not implement changing the number of SERP results per page to 20,50,100, because in my experience Google blocks the scraping then very quickly. Can anybody confirm?

What about support for duckduckgo as search engine?

I will implement it right now.

Thanks for your comments guys!

Shameless advertisement because I am a broke guy: If you wan to scrape in large quantities, just use my paid service: https://scrapeulous.com/

NikolaiT avatar Jan 29 '19 18:01 NikolaiT

I would be open to paying for the service if:

  • you can scrape more max 100 results per keyword
  • if you can filter the proxies per country
  • it's cheap and well maintained

danipolo avatar Jan 29 '19 18:01 danipolo

uncaptcha2 is seriously nice. Why didn't I not know that it exists?

you can scrape more max 100 results per keyword

Does it matter to you whether there are 10 times 10 results per page or there is one time 100 results per page?

For example one can scrape with such a url https://www.google.de/search?q=scraping&num=100 but in my experience you then tend to get blocked much quicker compared to not specifying the num parameter in the url.

if you can filter the proxies per country

On scrapeulous.com you can select the region to scrape from. All requests come from a similar IP range block.

NikolaiT avatar Jan 29 '19 18:01 NikolaiT

I don't care how it's organized. I need position 1 to 99. You can do it as you think is more efficient for you.

It would be perfect if you can choose IP per zone. The SERPs change drastically depending on the IP. I'm building a heavy dependent code on this results and I need a Good SERP Scraper. Let me know if you are open to chat about it.

danipolo avatar Jan 30 '19 00:01 danipolo

I am not developing customized scrapers right now. I am developing this scraper because I use it internally and on my scraping service.

Multiple pages are supported now in se-scraper, there is an option

num_pages: 3,

that you can pass to the scraper.

On scrapeulous.com, it will take some days until I support multiple pages.

NikolaiT avatar Jan 30 '19 20:01 NikolaiT

how to solve the follow problem: TimeoutError: waiting for selector "#center_col" failed: timeout 8000ms exceeded

wwwxmu avatar Feb 19 '19 15:02 wwwxmu

how to solve the follow problem: TimeoutError: waiting for selector "#center_col" failed: timeout 8000ms exceeded

Do you have network issues? Because I consider 8 seconds to wait for the first google html long enough.

But I will make it configurable in the next release.

NikolaiT avatar Feb 27 '19 20:02 NikolaiT

Can we put queries instead of keywords ? for example: intitle:scraper

Gunnerforlife avatar Mar 05 '19 08:03 Gunnerforlife

Can we put queries instead of keywords ? for example: intitle:scraper

Yes you can put whatever query you want. The whole thing is searched by the search engines.

NikolaiT avatar Mar 05 '19 14:03 NikolaiT

Hi, is it possible to make a full-page screenshot of SERP? If I add --screenshot to ADDITIONAL_CHROME_FLAGS I obtain this:

Error: Failed to launch chrome!
[0319/115015.357805:ERROR:headless_shell.cc(583)] Capture screenshot is disabled when remote debugging is enabled.

Is there a way to disable remote debugging? Can't find the flag causing this problem.

nittolese avatar Mar 19 '19 11:03 nittolese

Hi there, does this repo still work?

Aditya94A avatar Nov 15 '20 04:11 Aditya94A