weiboSpider icon indicating copy to clipboard operation
weiboSpider copied to clipboard

Crawling Problem

Open chauphamcreditproduct opened this issue 4 months ago • 5 comments

Hi, I hope you're doing well! I’m reaching out because I’m having some trouble crawling posts from an account. The account has 4,795 posts, but I’ve only been able to crawl around 1,663 posts so far. I’ve already tried adjusting the cookies, but it doesn’t seem to be working. Is there anything else I can do to fix this? I’d really appreciate any advice or suggestions you can share! Thanks so much, Chloe Pham

chauphamcreditproduct avatar Aug 05 '25 18:08 chauphamcreditproduct

感谢反馈。尽量把since_date参数设置成now,其它值会漏爬,或者使用免cookie版(最好添加有效cookie)。

dataabc avatar Aug 06 '25 06:08 dataabc

感谢反馈。请尽量将 since_date 参数设置为 now,其他值可能会导致漏抓,或者你也可以使用免 Cookie 版本(建议添加有效的 Cookie)。

我已经把 since_date 改成 now,也尝试更换了 Cookie,但还是不行啊。它只能爬取从 2023 年 8 月到现在的数据而已。

On Wed, Aug 6, 2025 at 02:48 Chen Lei @.***> wrote:

dataabc left a comment (dataabc/weiboSpider#664) https://github.com/dataabc/weiboSpider/issues/664#issuecomment-3157610627

感谢反馈。尽量把since_date参数设置成now,其它值会漏爬,或者使用免cookie版(最好添加有效cookie)。

— Reply to this email directly, view it on GitHub https://github.com/dataabc/weiboSpider/issues/664#issuecomment-3157610627, or unsubscribe https://github.com/notifications/unsubscribe-auth/BGTVRMMUXM6LKB4WV347OUD3MGQKDAVCNFSM6AAAAACDFZYGQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCNJXGYYTANRSG4 . You are receiving this because you authored the thread.Message ID: @.***>

chauphamcreditproduct avatar Aug 06 '25 10:08 chauphamcreditproduct

应该是接口限制,换成weibo-crawler看看。

dataabc avatar Aug 06 '25 10:08 dataabc

https://github.com/dataabc/weibo-crawler

Just to confirm — you mean I should use this link, right? https://github.com/dataabc/weibo-crawler

And it still runs using the weibo.py file, correct?

Thanks

Chloe

chauphamcreditproduct avatar Aug 06 '25 11:08 chauphamcreditproduct

是的,具体可以看它的readme文档。这个是免cookie版,不加cookie也可以运行,但是很多时候没有cookie爬不全,所以要添加有效cookie。

dataabc avatar Aug 06 '25 14:08 dataabc