gallery-dl
gallery-dl copied to clipboard
[twitter] [feature?] Possible to keep retweets and actual tweets by artists in separate folders?
Sometimes you get twitter accounts with ~500 art pieces, and ~5000 retweets of other artists' works.
Both are relevant to me, but I'd still like a distinction between the two.
I'm thinking something like:
C:\gallery-dl\twitter\ARTIST\ C:\gallery-dl\twitter\ARTIST\retweets\
Or possibly:
C:\gallery-dl\twitter\ARTIST\ C:\gallery-dl\twitter\ARTIST\retweets\RETWEETED TWITTER NAME\
Is that something already possible currently, and I simply need to adjust my config-file, or would that be a new feature?
My download queries look like this: gallery-dl_1.22.0.exe --config gallery-dl_config.conf https://twitter.com/ARTIST/media gallery-dl_1.22.0.exe --config gallery-dl_config.conf https://twitter.com/ARTIST
(I use both formats in my download queries.)
My twitter config:
"twitter":
{
"username": "[REDACTED]",
"password": "[REDACTED]",
"cookies": "twitter.com_cookies.txt",
"cookies-update": true,
"retweets": true,
"quoted": true,
"replies": true,
"text-tweets": true
}
(By the way, ''"text-tweets": true'' seems to not do anything? I'd assume it'd download text-tweets as text-files.)
This is already possible using "conditional" directory format strings. Something like the following puts retweets and quoted tweets into their own sub-directories:
"directory": {
"retweet_id" : ["{category}", "{user[name]}", "Retweets", "{author[name]}"],
"locals().get('quote_by')": ["{category}", "{user[name]}", "Quoted" , "{author[name]}"],
"" : ["{category}", "{user[name]}"]
}
"text-tweets": true
only effects the emitted metadata, for which you have to use a post processor to write it to a file, for example
"postprocessors": [
{
"name": "metadata",
"event": "post",
"filename": "{tweet_id}.json"
}
]
(See issue #570)
"username": "[REDACTED]", "password": "[REDACTED]", "cookies": "twitter.com_cookies.txt", "cookies-update": true,
You only need either username & password or cookies. If twitter.com_cookies.txt
contains an auth_token
cookie, your username & password settings get ignored.
thanks, this is what i needed too!
That works, thanks for the quick reply! :)
A follow-up question regarding the .json file metadata;
I) I'm guessing there's nothing already built to just extract the tweet's text? The rest of the meta data is just clutter for me. NOT an issue that needs solving, I can build something myself to extract the data locally from the file after the fact, just asking if a solution already exists.
II) For the sake of keeping easier overview of image/video files in the twitter directories, I'd like to know if it is currently possible to keep the .json files separate from the image/video files, either by:
IIa) ...downloading the .json files into a subdirectory or entirely separate directory? ( I.e. C:\gallery-dl\twitter\ARTIST\JSON\, or C:\gallery-dl\twitter\ARTIST_JSON\, or C:\gallery-dl\twitter\JSON\ARTIST\ )
IIb) ...downloading ONLY the .json files, NOT any images/videos, in which case I'd just make a separate gallery-dl directory for just that. (I.e. C:\gallery-dl\twitter\ARTIST\ and C:\gallery-dl_JSON_ONLY\twitter\ARTIST\ )
If neither solution is currently possible, I'll just think of some workaround myself, it'll just be a minor nuisance I suppose. Just figured I'd ask in advance before I begin mass-downloading those .json files. ;-)
I) just extract the tweet's text?
You can control what gets extracted by setting mode
to "custom"
and setting a content-format
format string:
"postprocessors": [
{
"name": "metadata",
"event": "post",
"filename": "{tweet_id}.txt",
"mode": "custom",
"content-format": "{content}"
}
]
(the text content of a tweet is stored in content
)
IIa) ...downloading the .json files into a subdirectory or entirely separate directory?
Can be done by setting a directory
for the metadata post processor. Keep in mind that this value will only be interpreted as a static string with environment variable support. It is not a fancy format string like the regular directory
value is.
Storing all .json files in C:\gallery-dl\twitter\ARTIST\JSON
would be done by adding "directory": "JSON"
to the post processor above.
IIb) ...downloading ONLY the .json files, NOT any images/videos
Can be done with --no-download
or the download
config option.
All configuration file options and post processor options can be found in docs/configuration.rst
with some usage examples in docs/gallery-dl-example.conf
Oh cool, thanks! I'm gonna try that and come back in case I mess it up somehow. ^_^#
sorry off topic but anyway to display only the tweets numbers, retweets number and total media number from -K command?
Alright, everything works, I'm ready to close the topic from my end. :)
sorry off topic but anyway to display only the tweets numbers, retweets number and total media number from -K command?
I messed around with that a bit, not completely sure what you meant, but did you want something like this?
"twitter":
{
"username": "[REDACTED]",
"password": "[REDACTED]",
"cookies": "twitter.com_cookies.txt",
"cookies-update": true,
"retweets": true,
"quoted": true,
"replies": true,
"text-tweets": true,
"directory": {
"retweet_id" : ["{category}", "{user[name]}", "Retweets", "{author[name]}"],
"locals().get('quote_by')": ["{category}", "{user[name]}", "Quoted" , "{author[name]}"],
"" : ["{category}", "{user[name]}"]
},
"postprocessors": [
{
"name": "metadata",
"event": "post",
"filename": "{tweet_id}.filetypeformatofyourchoicehere.txt",
"mode": "custom",
"content-format": "{content} {tweet_id} {retweet_id} {author[media_count]} {author[statuses_count]}",
"directory": "WHATEVER"
}
]
},
"content-format": "{content} {tweet_id} {retweet_id} {author[media_count]} {author[statuses_count]}" adds the text, tweet id and retweet id, and then "media count" and "statuses count" (dunno if that's what you wanted), and looks something like this:
blablabla 1234567890123456789 9876543210987654321 6 53
(Be aware that "blablabla" can also have blankspaces and linebreaks.)
...and writes it into the file path
C:\\gallery-dl\\twitter\\ARTIST\\Retweets\\WHATEVER\\1234567890123456789.filetypeformatofyourchoicehere.txt
Is this vaguely in the direction of what you wanted? Yay I'm helping...? ^_^#
Updated the above comment to include "media count" and "status count" (not sure if that is what you wanted).
I got all those parameters from the JSON file it downloads when you use the config info described in mikf's first reply on this topic.
no but its close enough. all i wanted is to see how much a user total tweets number, total retweets number and total media numbers displayed in a command prompt or printed in a file.txt
@afterdelight
no but its close enough
Then I'm gonna close the issue. Or was there something else you needed?
Do you want to get the information only once as an overview, or is it supposed to auto-update itself?
I think it should be possible to set it up so that it'd get the "media count / statuses count" to only be written to a single file per account, once, and then stop and do the next account (this would skip information of retweets from other accounts of course) - each with some easily searchable/extractable filename. And also to update them. Though this probably would have to be a separate command line call from the main gallery download because of the "get one result and then stop" bit.
I think that should be possible, though I'd have to look into it further. Is that something relevant to you?
(And if you need everything combined in one file for overview, this may be done via command line, maybe something like for /R %f in (*.txt) do type "%f" >> c:\Test\output.txt
)
@afterdelight
no but its close enough
Then I'm gonna close the issue. Or was there something else you needed?
Do you want to get the information only once as an overview, or is it supposed to auto-update itself?
I think it should be possible to set it up so that it'd get the "media count / statuses count" to only be written to a single file per account, once, and then stop and do the next account (this would skip information of retweets from other accounts of course) - each with some easily searchable/extractable filename. And also to update them. Though this probably would have to be a separate command line call from the main gallery download because of the "get one result and then stop" bit.
I think that should be possible, though I'd have to look into it further. Is that something relevant to you?
(And if you need everything combined in one file for overview, this may be done via command line, maybe something like
for /R %f in (*.txt) do type "%f" >> c:\Test\output.txt
)
'"media count / statuses count" to only be written to a single file per account, once, and then stop and do the next account'
Yes, this is what i want!! How to do that?? Sorry for long reply.
With a metadata
post processor.
"postprocessors": [
{
"name": "metadata",
"event": "init",
"filename": "{user[name]}.txt",
"mode": "custom",
"format": "{user[media_count]}\n{user[count]}\n"
}
]
You might also want to add "image-range": "0"
to "twitter"
or use --range 0
to stop before the first media file download.
Update to the initial question: Since v1.23.0 it is possible to replace the rather lengthy "locals().get('quote_by')"
condition from https://github.com/mikf/gallery-dl/issues/2663#issuecomment-1149051520 with just "quote_id"
.
thank, it worked! this is my config:
"postprocessors": [{
"directory" : "",
"name" : "metadata",
"event" : "init",
"filename" : "{user[name]}_{date:%Y%m%d}.info.txt",
"mode" : "custom",
"content-format": "Nick: {author[nick]}\nAccount Created: {author[date]}\nLocation: {author[location]}\nUrl: {author[url]}\nTotal Tweets: {author[statuses_count]}\nTotal Medias: {author[media_count]}\nTotal Retweet: {retweet_count}\nTotal Quote: {quote_count}\nTotal Reply: {reply_count}\nProfile Banner: {author[profile_banner]}\nProfile Picture: {author[profile_image]}"
}]
You should not use author[…]
in your content-format
string. If the first tweet is a retweet/quote, this will write data for the wrong user. Just keep everything as user[…]
.
oh right, thanks for the input. i have corrected my mistakes.