nasty
nasty copied to clipboard
Extend ThreadRetriever to also retrieve parent posts in a thread
I am trying to crawl a discussion on twitter. As an example let's pick 1299724507377274883
By running both thread and reply I get:
$ nasty r --tweet-id 1299724507377274883
Received 3 consecutive empty batches.
$ nasty t --tweet-id 1299724507377274883
Received 3 consecutive empty batches.
My intent is to download both the original post: Restaurant and hotel workers are receiving eye-expression training as they try to deliver service with a smile while the smile is out of service
and its reply: I'm kind of glad I don't have to smile all the time.
Is there something I am doing wrong?
PS: Thanks a lot for this wonderful package. Although it needs maintenance it is really well designed :+1:
Glad to hear you like the package overall, or at least its design :)
I'm sorry to say that both the thread
and replies
commands seem to work as expected, i.e., they only return child posts of the submitted Tweet-ID.
I understand that retrieving parent posts and especially also the post of the requesting Tweet-ID would be very valuable. Indeed, the JSONs Twitter returns would support this.
The reason this has not been implemented yet is that it would complicate the design quite a bit. Currently, both thread
and replies
are operations that take a Tweet-ID and return an iterable of Tweet objects. This enables an IMO intuitive Python API and also make for a straight-forward CLI where the JSONs of the resulting Tweets are written to stdout.
So far, I'm not sure how a good API would look for returning more complicated hierarchies of Tweets. For instance Twitter also often returns some replies of replies, which we currently just throw away. I have renamed the issue to make it clear that this would be an enhancement of current functionality. If someone has an elegant idea of how the API should look for something like this, I could get around to implementing it sometime.