snscrape icon indicating copy to clipboard operation
snscrape copied to clipboard

Weibo: "User does not exist" when using --name on certain accounts

Open TheTechRobo opened this issue 2 years ago • 10 comments

Haven't tested with user IDs.

~/u/steam ❯❯❯ python3 parse_weibo.deduped >> weibo.jsonl
['snscrape', '--jsonl', '--progress', 'weibo-user', '--name', 'fangshengmeng']
2022-04-03 22:27:12.723  WARNING  snscrape.modules.weibo  User does not exist
Finished, 0 results
['snscrape', '--jsonl', '--progress', 'weibo-user', '--name', 'qukean']
2022-04-03 22:27:14.052  WARNING  snscrape.modules.weibo  User does not exist
Finished, 0 results
['snscrape', '--jsonl', '--progress', 'weibo-user', '--name', 'yetuavg']
2022-04-03 22:27:15.280  WARNING  snscrape.modules.weibo  User does not exist
Finished, 0 results
['snscrape', '--jsonl', '--progress', 'weibo-user', '--name', 'zhangfrank110']
2022-04-03 22:27:16.472  WARNING  snscrape.modules.weibo  User does not exist
Finished, 0 results

With verbose output (can't get locals because it didn,t crash; you should add an option to dump them anyway):

~/u/steam ❯❯❯ snscrape -v --progress --jsonl weibo-user --name fangshengmeng  2
2022-04-03 22:29:29.953  INFO  snscrape.base  Retrieving https://m.weibo.cn/n/fangshengmeng
2022-04-03 22:29:31.017  INFO  snscrape.base  Retrieved https://m.weibo.cn/n/fangshengmeng: 200
2022-04-03 22:29:31.017  WARNING  snscrape.modules.weibo  User does not exist
2022-04-03 22:29:31.017  INFO  snscrape._cli  Done, found 0 results
Finished, 0 results

Also it seems really unintuitive to have to add --name as an option if it's not a user ID; could this be fixed like it was with the Twitter scraper, i.e. seeing if it's an int?

TheTechRobo avatar Apr 04 '22 02:04 TheTechRobo

Can reproduce that with those names, but as far as I can tell, none of them exist (or their profiles require logging in, perhaps?). Others work correctly. Random example: Angelinazhaoooo (though it crashes with a KeyError on the video extraction very quickly).

The Twitter scraper also has an explicit flag, --user-id. Automatic detection for that obviously breaks when someone has a username composed solely of digits.

JustAnotherArchivist avatar Apr 04 '22 03:04 JustAnotherArchivist

Also, to dump on every WARNING or higher, there is a global option: --dump-locals (Yes, it should probably get a better name.)

JustAnotherArchivist avatar Apr 04 '22 03:04 JustAnotherArchivist

The Twitter scraper also has an explicit flag, --user-id. Automatic detection for that obviously breaks when someone has a username composed solely of digits.

Oh, I guess that's true.

TheTechRobo avatar Apr 04 '22 15:04 TheTechRobo

I could have sworn they existed when I loaded it up into a browser, but maybe I'm wrong. Sorry for opening this invalid issue, I guess.

Wait, https://weibo.com/qukean exists I think

TheTechRobo avatar Apr 04 '22 15:04 TheTechRobo

Maybe, but that's behind a login wall. The mobile site, which is publicly accessible and therefore used by snscrape, says it doesn't exist: https://m.weibo.cn/n/qukean

JustAnotherArchivist avatar Apr 04 '22 16:04 JustAnotherArchivist

Oh, I'm using weibo.com. Is that different? I don't have to login for weibo.com/qukean:

image

TheTechRobo avatar Apr 04 '22 20:04 TheTechRobo

Yeah, it's not really a login, but it's an auth system of sorts with awful JS stuff to get cookies for accessing weibo.com (that I didn't want to reimplement). It is the same service though, so it's interesting that this profile is only accessible on weibo.com but not on m.weibo.cn.

JustAnotherArchivist avatar Apr 04 '22 22:04 JustAnotherArchivist

Oh yeah, I noticed that redirect. Sounds very annoying to bypass or mimic.

TheTechRobo avatar Apr 05 '22 01:04 TheTechRobo

Yeah, the only way to fix this would be to reimplement that auth flow. Not something I'll tackle anytime soon, I think.

JustAnotherArchivist avatar Apr 05 '22 04:04 JustAnotherArchivist

It's only the name resolution which is the problem here, it seems. qukean is user ID 1223717857, and that works fine on the mobile site (and consequently with snscrape). The name resolution on weibo.com is still behind the same auth flow though, so this insight doesn't really change anything, but at least you can manually work around it by observing the user ID in the network monitor when loading the profile page and then using that.

JustAnotherArchivist avatar Apr 15 '22 03:04 JustAnotherArchivist