gallery-dl icon indicating copy to clipboard operation
gallery-dl copied to clipboard

Gallery-dl missed an image while scraping a Twitter account, curious to know how to fix this for future attempts

Open rarelygoeshere opened this issue 1 year ago • 9 comments

Hello there, while I was checking into the content of this account, I noticed that it was missing an image.

I found that for whatever reasons, it didn't scraped this tweet from their account, despite it clearly being present https://twitter.com/mamezurushiki/status/328853786019917825 I even checked the output, which I placed below, and search the numbers (328853786019917825), but nothing can be found, indicating it didn't failed but never even scraped it in the first place.

I don't know if this is because of Twitter's recent update or what, but I hope you can fix this so I can be reassured that my gallery-dl is scraping all, or as much Twitter's content as it is capable of. Thank you.

mamezurushiki gallery-dl output.txt

rarelygoeshere avatar Dec 25 '23 11:12 rarelygoeshere

it downloads it directly.

[gallery-dl][debug] Version 1.26.4
[gallery-dl][debug] Python 3.8.3 - Windows-10-10.0.17763-SP0
[gallery-dl][debug] requests 2.31.0 - urllib3 2.1.0
[gallery-dl][debug] Configuration Files []
[gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/mamezurushiki/status/328853786019917825'
[twitter][debug] Using TwitterTweetExtractor for 'https://twitter.com/mamezurushiki/status/328853786019917825'
[twitter][info] Requesting guest token
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): api.twitter.com:443
[urllib3.connectionpool][debug] https://api.twitter.com:443 "POST /1.1/guest/activate.json HTTP/1.1" 200 63
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): twitter.com:443
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/2ICDjqPd81tulZcYrtpTuQ/TweetResultByRestId?variables=%7B%22tweetId%22%3A%22328853786019917825%22%2C%22withCommunity%22%3Afalse%2C%22includePromotedContent%22%3Afalse%2C%22withVoice%22%3Afalse%7D&features=%7B%22creator_subscriptions_tweet_preview_api_enabled%22%3Atrue%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22responsive_web_twitter_article_tweet_consumption_enabled%22%3Afalse%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Atrue%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Atrue%2C%22longform_notetweets_rich_text_read_enabled%22%3Atrue%2C%22longform_notetweets_inline_media_enabled%22%3Atrue%2C%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22responsive_web_media_download_video_enabled%22%3Afalse%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%7D&fieldToggles=%7B%22withArticleRichContentState%22%3Afalse%7D HTTP/1.1" 200 1573
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): pbs.twimg.com:443
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/BJBSxqtCcAM668X?format=jpg&name=orig HTTP/1.1" 200 47835
* .\gallery-dl\twitter\mamezurushiki\328853786019917825_1.jpg

try adding "/media" to the username url to see if it skips it again.

jadedgnome avatar Dec 26 '23 09:12 jadedgnome

Tweet 328853786019917825 is very old - over 10 years - so the only way to get it from just the username (that I know of and what gallery-dl uses) is via search, and search results are unreliable and incomplete.

mikf avatar Dec 26 '23 14:12 mikf

Tweet 328853786019917825 is very old - over 10 years - so the only way to get it from just the username (that I know of and what gallery-dl uses) is via search, and search results are unreliable and incomplete.

Hmmmm I see. So does that mean it'll be hard pressed for gallery-dl to scrape that tweet and similar tweets, even if I try adding "/media" to the username url like the comment above suggests? So I guess that means this problem is probably not solvable regarding gallery-dl?

Edit: Does the recent Twitter update impact my usage of gallery-dl in any capacity? Do I need to change my twitter config to deal with it?

rarelygoeshere avatar Dec 27 '23 12:12 rarelygoeshere

In my experience, the search results seem to change over time. Twitter has this to say about what appears in search:

Do your posts contribute to the conversation in a meaningful way? We strive to show the most relevant, credible, and safe content in search.

They don't allow NSFW tweets to be in it either.

https://help.twitter.com/en/using-x/x-search-not-working

Edit: I have no idea why my post duplicated when I edited it. I was using the mobile app so I couldn't preview the markdown

Twi-Hard avatar Dec 27 '23 13:12 Twi-Hard

They don't allow NSFW tweets to be in it either.

There is an option to show/hide "sensitive" search results. (under "Settings" -> "Privacy and safety" -> "Content you see" -> "Search settings") _

mikf avatar Dec 28 '23 21:12 mikf

Tweet 328853786019917825 is very old - over 10 years - so the only way to get it from just the username (that I know of and what gallery-dl uses) is via search, and search results are unreliable and incomplete.

Hmmmm I see. So does that mean it'll be hard pressed for gallery-dl to scrape that tweet and similar tweets, even if I try adding "/media" to the username url like the comment above suggests? So I guess that means this problem is probably not solvable regarding gallery-dl?

Edit: Does the recent Twitter update impact my usage of gallery-dl in any capacity? Do I need to change my twitter config to deal with it?

Sorry to bother folks, but would anyone mind answering my inquiry? Im curious to know if Twitter's new update neccesiates changing my config to make sure gallery-dl scrape as much as possible.

rarelygoeshere avatar Dec 29 '23 11:12 rarelygoeshere

I think it depends on what you include to download. For example, I'm re-scraping some profiles to test and it downloaded new pictures using "include": ["timeline", "media", "replies"], in my config file. Though I'm still not very sure about the timeline.strategy

Fukitsu avatar Dec 30 '23 00:12 Fukitsu

I think it depends on what you include to download. For example, I'm re-scraping some profiles to test and it downloaded new pictures using "include": ["timeline", "media", "replies"], in my config file. Though I'm still not very sure about the timeline.strategy

Ok, well here's my config for Twitter. Im quite certain there should be nothing amiss about it and it should be able to download all the media tweets of a profile.

"twitter":
        {
            "username": "null",
            "password": "null",
			"filename": "{author[name]}-{author[id]}({author[date]:%Y%m%d_%H%M%S})-{tweet_id}({date:%Y%m%d_%H%M%S})-{num}.{extension}",
            "cards": false,
            "conversations": false,
            "pinned": false,
            "quoted": false,
            "replies": true,
            "retweets": false,
            "strategy": null,
            "text-tweets": false,
            "twitpic": false,
            "unique": true,
            "users": "timeline",
            "videos": true```

rarelygoeshere avatar Jan 05 '24 05:01 rarelygoeshere

I think it depends on what you include to download. For example, I'm re-scraping some profiles to test and it downloaded new pictures using "include": ["timeline", "media", "replies"], in my config file. Though I'm still not very sure about the timeline.strategy

Ok, well here's my config for Twitter. Im quite certain there should be nothing amiss about it and it should be able to download all the media tweets of a profile.

"twitter":
        {
            "username": "null",
            "password": "null",
			"filename": "{author[name]}-{author[id]}({author[date]:%Y%m%d_%H%M%S})-{tweet_id}({date:%Y%m%d_%H%M%S})-{num}.{extension}",
            "cards": false,
            "conversations": false,
            "pinned": false,
            "quoted": false,
            "replies": true,
            "retweets": false,
            "strategy": null,
            "text-tweets": false,
            "twitpic": false,
            "unique": true,
            "users": "timeline",
            "videos": true```

is there anyway to pass all these to the command line via flags?

jadedgnome avatar Jan 05 '24 10:01 jadedgnome