Search-Engines-Scraper Feature request Yandex and Baidu

Feature request Yandex and Baidu

Open LeoJavaAI opened this issue 4 years ago • 2 comments

Thanks for your work, Please consider adding Yandex and Baidu if possible

Apr 17 '21 12:04 LeoJavaAI

Sounds interesting, I'll see what I can do. I think Yandex is simple enough, but I don't know if we can scrape Baidu without Selenium and I'd like to avoid that.

Apr 21 '21 22:04 tasos-py

After some research, I don't think I can add Yandex or Baidu. Yandex keeps giving me a captcha after a couple of requests. Maybe Selenium could help with that, but I want to keep this repo as simple as possible, so I'd rather not add browser automation or OCR dependencies.

Baidu doesn't require Selenium, the problem here is that it doesn't have direct links, the links are like this www.baidu.com/link?url=kh39xCQVnS7frJSxGrpfLAXdudtflGhAhAK8YjhSgpwyf0Sl8L41EGODywKx6Vvqy8UbcOnNGkuEntr1m9KLmq. The url= parameter looks like a base64 string, but it doesn't decode to text and I don't think decoding/decryption is done in client side, the server redirects to the final link. We could use the server to get the actual URLs, but that would be very inefficient and it would probably result in bans.

So, I don't know how to proceed further, if you have any ideas I'd love to hear them.

Apr 28 '21 07:04 tasos-py

Search-Engines-Scraper Search-Engines-Scraper copied to clipboard

Feature request Yandex and Baidu

Search-Engines-Scraper
Search-Engines-Scraper copied to clipboard