snscrape icon indicating copy to clipboard operation
snscrape copied to clipboard

Recursive or scroll tweet scraping misses tweets hidden behind 'Show more replies' button

Open ladopixel opened this issue 1 year ago • 7 comments

Is there a possibility to scrape the text of the comments of a given tweet? I get the number of comments but not the comment text. I am using this → snscrape --jsonl twitter-tweet <id_tweet>

ladopixel avatar Oct 19 '22 04:10 ladopixel

Use --scroll or --recurse (scroll just gets the replies, recurse gets the replies of the replies etc and is significantly slower)

TheTechRobo avatar Oct 19 '22 12:10 TheTechRobo

Ok, thanks!!

It works almost perfect, I have 43 comments and it shows only 41, I don't know what can happen. The same thing happens to me with both methods (--scroll and --recurse).

import snscrape.modules.twitter as sntwitter
import json

array_comments = []
tweet = input('Enter tweet id: ')

for i, tweet in enumerate(sntwitter.TwitterTweetScraper(tweetId=tweet, mode=sntwitter.TwitterTweetScraperMode.SCROLL).get_items()):
        array_comments.append(json.loads(tweet.json()))
        print(f'→ {array_comments[i]["rawContent"]}')

ladopixel avatar Oct 19 '22 15:10 ladopixel

Twitter's counters are not entirely reliable. You will often see a lower number of actual results – not just with snscrape but also in a browser.

JustAnotherArchivist avatar Oct 19 '22 17:10 JustAnotherArchivist

The issue is that when I display them, I see that I'm actually missing the two that are hidden under the "Show more" button on Twitter.

ladopixel avatar Oct 19 '22 17:10 ladopixel

I see. That sounds like a bug, yeah. Which tweet is it, and which two are missing?

JustAnotherArchivist avatar Oct 19 '22 17:10 JustAnotherArchivist

Tweet → This

  1. Comment of [Armenek]
  2. Comment of [djuwadiprints] (the two that are hidden under the "Show more" button)

ladopixel avatar Oct 19 '22 17:10 ladopixel

Does it give reply of a specific tweet( i use original tweet)? I am new here, but cant find any way. When search with the conversation id sometimes I dont get any result. I am searching with respect to historical tweets reply.

Jannatul1607551 avatar Dec 01 '22 03:12 Jannatul1607551

I keep getting the same error. After the update I still can't read the comment that appears after the read more button.

ladopixel avatar Feb 19 '23 06:02 ladopixel

Yeah, I fixed it, but then I changed something else which broke it again.

JustAnotherArchivist avatar Feb 19 '23 06:02 JustAnotherArchivist

I'm still working on other things, but current master (c65e36a0) should work correctly now.

JustAnotherArchivist avatar Feb 19 '23 06:02 JustAnotherArchivist

I can't get it to work, I hope you can fix it soon. Thanks a lot for your work.

ladopixel avatar Feb 19 '23 06:02 ladopixel

You're going to need to share more information then. snscrape twitter-tweet --scroll 1577907836356644865 with the latest master (plus some extra code on top that shouldn't affect this) returns 1577992565542080513 as the last result here. The other one you mentioned seems to have vanished since October. I'm aware that the 'offensive' replies button is broken (fixed locally but not pushed yet), but that's not what this issue is about.

JustAnotherArchivist avatar Feb 19 '23 06:02 JustAnotherArchivist

Until there is a reproducible example that doesn't work with the current master, I'm going to consider this fixed.

JustAnotherArchivist avatar Feb 21 '23 04:02 JustAnotherArchivist

I will try to test during the day. I'll tell you something with the result obtained. Thank you very much!

ladopixel avatar Feb 21 '23 05:02 ladopixel

It works perfectly! 💛

ladopixel avatar Feb 21 '23 05:02 ladopixel