gallery-dl icon indicating copy to clipboard operation
gallery-dl copied to clipboard

Large Twitter Galleries Not Fully Downloading

Open ghost opened this issue 2 years ago • 21 comments

I tried to download every MP4 from this gallery (NSFW), and it only went as far back as this tweet (also NSFW). After that tweet, it just stopped and acted as though it had downloaded the entire gallery, meaning that any older tweets, such as this one, were excluded.

If I don't use the link for the media tab, it stops at an even more recent tweet (NSFW).

For reference, the command I ran was gallery-dl "https://twitter.com/furui_1111/media" --filter "extension in ('mp4')"

ghost avatar Jan 25 '22 17:01 ghost

This is a limit on twitter's end, unfortunately.

https://github.com/mikf/gallery-dl/issues/1396

Zoodee avatar Jan 25 '22 17:01 Zoodee

I personally ended up having to make a PowerShell script that would cycle through two week intervals since the user's registration date using Twitter search queries, with one day of overlap in both from: and until: so nothing gets skipped over (with include:nativeretweets in the search and &f=live instead of &f=top in the url).

Even still, I don't think that will get everything, because Twitter is dumb like that. It caps out retrieving a user's timeline at around 3200 tweets (and that includes retweets).

Ghost-Terms avatar Jan 28 '22 17:01 Ghost-Terms

What usually works for me is using a search query to download: gallery-dl "https://twitter.com/search?q=(from:USER)"

tux93 avatar Feb 10 '22 20:02 tux93

twitter shadow hide all nsfw from search feature so i think there's no way to download all, you must use https://stevesie.com/apps/twitter-api/scrape/tweets/by-user

wankio avatar May 25 '22 14:05 wankio

From my experience, it seems people usually don't flag stuff as "sensitive content". I've seen that with both art and irl stuff. I've been using the api to get the total tweet count of 100% nsfw accounts and compared it against how much I could get from the search results and it's usually most of the tweets. I tried searching a nsfw tag in the browser and tweets marked by Twitter as "sensitive content" (ones that require you to verify you want to see it) still popped up occasionally. Here's how they determine what's allowed in search results: https://help.twitter.com/en/using-twitter/twitter-search-not-working

Twi-Hard avatar May 25 '22 15:05 Twi-Hard

it's been blocked since 2020

wankio avatar May 25 '22 16:05 wankio

Bumping this because I'm trying to consolidate my various twitter archives. I have a lot of content that was downloaded from twMediaDownloader and I planned to merge this content with gallery-dl using twitter-click-and-save to minimize file duplication by hardlinking across the drive.

My example case is casulcasulcasul with the following settings:

"twitter":
        {
            "sleep": 3.0,
            "sleep-request": 3.0,
            "archive": "/run/media/xxx/bfd18/dl/gallery-dl/sql/twitter.sqlite3",
            "archive-format": "{author[name]}—{date:%Y.%m.%d}—{tweet_id}—{filename}",
            "username": "xxx",
            "password": "yyy",
            "cards": false,
            "conversations": false,
            "quoted": false,
            "replies": true,
            "retweets": false,
            "text-tweets": false,
            "twitpic": false,
            "users": "timeline",
            "videos": true,
            "filename": "[twitter] {author[name]}—{date:%Y.%m.%d}—{tweet_id}—{filename}.{extension}"
        },

As this account seems to fall under 1000 media tweets, I try gallery-dl https://twitter.com/casulcasulcasul/media.

Out of the 970 media files twMediaDownloader calculated (using dryrun), gallery-dl using the above command downloaded 944, seemingly omitting anything earlier than November 2019.

Using gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)" we get a bit further (but the process is much slower): This results in 968 files...

I don't quite know what/where the issue is, but the two tweets gallery-dl seems to miss are this one and this one. Manually grabbing those tweets with gallery-dl downloads just fine.

EDIT: I guess theoretically, you could use twMediaDownloader to generate a list of media tweets and use the .csv file it provides as input for gallery-dl. :thinking:

EDIT2: Some issues are stating to include filter:media in your gallery-dl command but I have never once had this work. However, I have found f=media works, so the full command would be gallery-dl "https://twitter.com/search?q=(from:username)&f=media" and the quotes are important for single command line but aren't needed if an input file is used. This command gets more media than just gallery-dl https://twitter.com/username/media and is faster than gallery-dl "https://twitter.com/search?q=(from:username)" alone.

biggestsonicfan avatar Aug 09 '22 09:08 biggestsonicfan

Are you sure gallery-dl "https://twitter.com/search?q=(from:username)&f=media" makes a difference? From what I see it shouldn't.

You can use both https://twitter.com/casulcasulcasul/media and https://twitter.com/search?q=(from:casulcasulcasul) links. Once you download first one, copy the ID of the last downloaded tweet and put it in a search like this: https://twitter.com/search?q=from:casulcasulcasul+max_id:ID_HERE. To speed up the process you can add filter:links (https://twitter.com/search?q=from:casulcasulcasul+max_id:ID_HERE+filter:links).

Or you can download latest artifact and simply paste https://twitter.com/casulcasulcasul if you don't want to bother with 2 links yourself.

nisehime avatar Aug 11 '22 13:08 nisehime

I will scream it from the rooftops:

xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=from:casulcasulcasul+max_id:1081020936185274371+filter:links
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=from:casulcasulcasul+max_id:1081020936185274371
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371+filter:links
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371+filter:links
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371+filter:links"
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371+filter:links"
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371"
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)+filter:links"
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)"
/run/media/xxx/bfd18/dl/gallery-dl/twitter/casulcasulcasul/[twitter] casulcasulcasul—2022.08.13—1558577888386920448—FaEnVbxUIAAuR8J.jpg
/run/media/xxx/bfd18/dl/gallery-dl/twitter/casulcasulcasul/[twitter] casulcasulcasul—2022.08.12—1558220785201713152—FZ-ytOZakAEydm7.jpg

Addding + or filter: will not work with my install of gallery-dl. --version output: 1.23.0-dev (linux)

biggestsonicfan avatar Aug 14 '22 12:08 biggestsonicfan

@cglmrfreeman use %20 or plain spaces instead of + signs

$ gallery-dl https://twitter.com/search?q=from:casulcasulcasul%20max_id:1081020936185274371%20filter:links
/tmp/twitter/casulcasulcasul/1081020936185274371_1.jpg
/tmp/twitter/casulcasulcasul/1071670086094643201_1.jpg
...
$ gallery-dl "https://twitter.com/search?q=from:casulcasulcasul max_id:1081020936185274371 filter:links"
/tmp/twitter/casulcasulcasul/1081020936185274371_1.jpg
/tmp/twitter/casulcasulcasul/1071670086094643201_1.jpg
...

mikf avatar Aug 14 '22 12:08 mikf

Huh, that one worked. I don't think I've ever seen anyone suggest that before. It's always "copy the twitter search url" https://twitter.com/search?q=from%3Acasulcasulcasul+max_id%3A1081020936185274371+filter%3Alinks which does not work or use the + or & signs that seemingly throw Requested user could not be found.

I will def be using this from now on, thanks!

biggestsonicfan avatar Aug 14 '22 12:08 biggestsonicfan

You probably should have put the link in double quotes "https://twitter.com/search?q=from:casulcasulcasul+max_id:ID_HERE+filter:links" to get + working.

But as mikf said plain spaces are fine too. In double quotes as well.

nisehime avatar Aug 14 '22 15:08 nisehime

No, + signs as space replacements do not work in gallery-dl.

The function that parses query parameters does not "support" them, meaning it just returns + as is and does not replace them with a space character as might be expected.

mikf avatar Aug 14 '22 15:08 mikf

I see. Well, it still works with twitter specifically. Pluses in a query string are just ignored by twitter (or treated as spaces).

nisehime avatar Aug 14 '22 15:08 nisehime

Oh, so the "NotFoundError"s are a bug introduced with https://github.com/mikf/gallery-dl/commit/77bdd8fe0f1702955d0746a81ea7a24c9d1bb065.

This commit splits search queries by whitespace only, and throws an error because there is no user named casulcasulcasul+max_id:1081020936185274371+filter:links

mikf avatar Aug 14 '22 15:08 mikf

Ah I only recently started using gallery-dl for twitter archiving and I definitely updated after that, so that might explain it.

biggestsonicfan avatar Aug 14 '22 16:08 biggestsonicfan

I see x2, I'm on latest stable ver, so I didn't notice. I thought you would leave the behavior for search as it was. I guess you should also consider that there can be multiple from: in a query if you haven't already. Also @ can be used instead of from:

nisehime avatar Aug 14 '22 16:08 nisehime

For smaller galleries gallery-dl "https://twitter.com/search?q=from:Cotonus filter:links" does not grab nearly as much as gallery-dl https://twitter.com/Cotonus, and gallery-dl "https://twitter.com/search?q=from:Cotonus filter:media" only grabs 3 files. Twitter filters really suck these days.

biggestsonicfan avatar Sep 03 '22 04:09 biggestsonicfan

If you mean retweets you should add include:nativeretweets in the search

nisehime avatar Sep 03 '22 07:09 nisehime

I don't mean retweets. gallery-dl "https://twitter.com/search?q=from:Cotonus filter:links" - 28 files gallery-dl https://twitter.com/Cotonus - 32 files gallery-dl "https://twitter.com/search?q=from:Cotonus filter:media" - 3 files

biggestsonicfan avatar Sep 03 '22 17:09 biggestsonicfan

Yeah, there's 2 posts which don't appear in the search at all. Even without filters.

nisehime avatar Sep 03 '22 19:09 nisehime

Popping back in here to say after fairly extensive testing, gallery-dl https://twitter.com/username is actually giving the maximum number of results at this point.

biggestsonicfan avatar Jan 24 '23 05:01 biggestsonicfan

Popping back in here to say after fairly extensive testing, gallery-dl https://twitter.com/username is actually giving the maximum number of results at this point.

usually username and username/media, but pretty sure if their twitter have so many retweet and media, you can't get all, tries some 5-10k tweet to see, that's twitter limit

wankio avatar Jan 24 '23 12:01 wankio