gallery-dl
gallery-dl copied to clipboard
Can't set retweets to download to a separate folder
According to the following issues on github this feature should be available. #https://github.com/mikf/gallery-dl/issues/1421 #https://github.com/mikf/gallery-dl/issues/1334
I am currently on version 1.17.2.
config.set(("extractor", "twitter"), "directory", ["twitter","{author[name]}","archive"])
config.set(("extractor", "twitter"), "retweets", ("directory", ["Twitter", "{user_likes}", "Likes"]))
I have tried a variety of things to set for "retweets" but nothing changes unless I set it to False. Then it doesn't download retweets, as should be expected. When I feed gallery-dl a url that is a users twitter page I want to download the users tweets in a base folder then their retweets in a separate folder within their folder. For Example:
targetUser targetUser/retweets
That is what I want.
I "fixed" this by downloading to a directory further in with retweets enabled, with a script that follows the downloaded images out if they match the target username. Not an elegant solution, but I'm writing this here for anyone wondering the same thing.
I thought it might've been possible to do this with "{'.' if retweet_id==0 else 'Retweets'}"
but the function that handles this doesn't actually use fstrings, probably to avoid arbitrary code execution problems
Sorry to bug you, @mikf, but you got any ideas?
I thought it might've been possible to do this with
"{'.' if retweet_id==0 else 'Retweets'}"
but [https://github.com/mikf/gallery-dl/blob/d09bc5bd3462b75a784c8406c549e1c1858f9852/gallery_dl/util.py#L599) doesn't actually use fstrings, probably to avoid arbitrary code execution problemsSorry to bug you, @mikf, but you got any ideas?
I suspect it's more cuz f-strings are executed in runtime (can't be stored, then executed later) while str.format()
can use templates saved into a string like the {author[name]}
.
Plus it's faster and way more powerful than the old printf-style percentage format still used by youtube-dl for compatibility, something that couldn't do something like {author[name]}
. While the f-strings are even faster and more powerful, which really should be used when templates aren't needed.
"{retweet_id:?//L0/Retweets/}"
seems to work.
It produces an empty string when retweet_id is 0
and Retweets
otherwise.
f-strings are Python 3.6+ only and I kind of want to keep Python 3.4 compatibility for gallery-dl v1.x.
"{retweet_id:?//L0/Retweets/}"
seems to work.
Care to explain how on earth that works and how you managed to come up with it? I am very confused
This feels like it should be documented somewhere
?//
returns an empty string when its input (retweet_id
) evaluates to False
(e.g. is 0
) and otherwise returns its input as a string. It also stops any further processing.
L0/Retweets/
returns its input if its len()
is <= 0 and Retweets
otherwise.
So if retweet_id == 0
-> empty string through ?//
, otherwise Retweets
through L0/…/
This feels like it should be documented somewhere
I know, and it kind of is here. The whole string formatting system needs to be redone with a proper parser and all that at some point, and I've been delaying writing any docs until that is done ...
config.set(("extractor", "twitter"), "retweets", ("directory", ["Twitter", "{user_likes}", "Likes"]))
I think this cannot possibly work, by the way.
Because you can only set the "retweets"
option for Twitter to a boolean value (or to the "original"
special value).
When I feed gallery-dl a url that is a users twitter page I want to download the users tweets in a base folder then their retweets in a separate folder within their folder. For Example:
targetUser targetUser/retweets
That is what I want.
The good news: This should be possible now.
Look here: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractordirectory
The "directory"
option can - analogous to the "filename"
option - be set to an object containing Python expression mappings.
So it should be possible to do something like this (in the Twitter section of your config file):
"directory": {
"retweet_id != 0" : ["Twitter", "{user[name]}", "Retweets"],
"" : ["Twitter", "{user[name]}"]
}
@ExeArco Could you please try this?
config.set(
("extractor", "twitter"),
"directory", [
{
"retweet_id != 0" : ["{category}", folder,"archive", "retweets"],
"" : ["{category}", folder,"archive"]
}
]
)
This is what I attempted to use, however it gives me the following error: [twitter][error] DirectoryFormatError: Applying directory format string failed (TypeError: expected str, got dict)
And just to confirm, I am on 1.18.0.
It was implemented one commit after 1.18.0
You'll want to overwrite C:/Python39/Lib/site-packages/gallery_dl/util.py
with this version of util.py
Alright I updated to 1.18.1-dev and I get this error instead [twitter][error] DirectoryFormatError: Applying directory format string failed (TypeError: unhashable type: 'dict')
Your directory
value is a list with a dict as element. It should just be a dict:
config.set(
("extractor", "twitter"),
"directory", {
"retweet_id != 0" : ["{category}", folder, "archive", "retweets"],
"" : ["{category}", folder, "archive"],
},
)
Alright well I fixed that but now it doesn't seem to be separating them at all, it seems that it doesn't ever trigger the retweet != 0 condition. Here are all the config.sets related to twitter before I run that downloadJob.
config.set(("extractor", "twitter"), "filename", "{user[name]}_{tweet_id}_{date}_{num}.{extension}")
config.set(("extractor", "twitter"), "quoted", True)
config.set(("extractor", "twitter"), "text-tweets", True)
config.set(("extractor", "twitter"), "retweets", "original")
config.set(
("extractor","twitter"),
'postprocessors', [
{
"name": "metadata",
"event": "post",
"filename": "{user[name]}_{tweet_id}_{date}_{num}.data.json"
}
]
)
config.set(
("extractor", "twitter"),
"directory", {
"retweet_id != 0" : ["{category}", folder,"archive", "retweets"],
"" : ["{category}", folder,"archive"]
},
)
It is not really possible to differentiate between Tweets and Retweets when setting retweets
to "original"
. It replaces each Retweet entry with its original Tweet, which has a retweeted_status_id_str
/retweet_id
value of 0
.
Since you are using Python, you could change the condition before each new Twitter URL to check if the author['name']
field matches the expected username and determine if it's a Retweet like that. Or gallery-dl could update the retweet_id
value in such cases, which shouldn't break anything else.
I'm using this to archive the user profile themselves, not passing through individual retweet/tweet/status links, so I can't check each URL itself. It would be a much more elegant solution if I could just do it all in one go.
What's the specific reason for not using extractor.twitter.retweets
= true
?
If I set retweets = true it sets the username of the retweet to the target of my download, not the original poster of what is being retweeted. EX: I want to archive posts done by A. If I set retweets to true, my sub folder with all the retweets will be full of content from B,C,D,E,F,G however they will all have the file name set as if A actually tweeted them.
You can use retweets: true
and replace the {user[name]}
in filename
with {author[name]}
Yes, that is exactly what I am using here.
If I do as Scripter17 has suggested it works however it now doesn't sort out quoted retweets., and looking through the metadata json that I have, it appears there is not anything I can use to sort that out, unless I am missing anything.
quoted retweets
Do you mean regular quote tweets or quotes of a retweet? I don't think I've ever seen the latter, but maybe that exists as well.
it appears there is not anything I can use to sort that out
Quoted tweets have a non-zero quote_id
, so it'd be something like
"directory": {
"retweet_id": ["{category}", folder, "archive", "retweets"],
"quote_id" : ["{category}", folder, "archive", "quotes"],
"" : ["{category}", folder, "archive"],
}
I've also updated the behavior of "retweets"; "original"
to have a non-zero value for retweet_id
(https://github.com/mikf/gallery-dl/commit/414bdc95a342c333c646b273d81bf74304475c53)
I'll be honest I'm not sure exactly what this is called either, but take this example. https://twitter.com/BarackObama This is the link I send to download, later down Obama quotes a NetflixFilm account(https://twitter.com/BarackObama/status/1408818108224131074), and I would like that put in retweets. Looking through the metadata, it appears quote_id, reply_id, and retweet_id are all 0 so I am not sure how to separate this one out.
I am still looking for a fix to this, does anyone have one? I have just updated to the latest version and there still seems to be no way to properly separate those tweets out.
[..] it appears quote_id, reply_id, and retweet_id are all 0 so I am not sure how to separate this one out.
Not sure, but if this is what gets returned by Twitter..
PS D:\> gallery-dl --ignore-config -K 'https://twitter.com/BarackObama/status/1408818108224131074' --option '"quoted"=true' | sls "tweet|quote|reply" -NoEmphasis -Context 0,1
> quote_count
439
> quote_id
0
> reply_count
268
> reply_id
0
> retweet_count
792
> retweet_id
0
> tweet
> tweet_id
1408510014847959053
> quote_count
439
> quote_id
0
> reply_count
268
> reply_id
0
> retweet_count
792
> retweet_id
0
> tweet
> tweet_id
1408510014847959053
PS D:\>
This tweet seems a bit strange.. If I don't use -o '"quoted"=true'
gallery-dl skips this tweet with default settings ("skipping quoted tweet")..
So, something is a bit off here, I guess. I blame Twitter.
I mean it is a quoted tweet right? So then why is quote_id 0? I really would like to separate these out through gallery-dl
As it turns out, quote_id
is the ID of the quoted Tweet and is only nonzero for Tweets that contain a quote, not for the Tweet being quoted. It is the exact opposite of how retweets behave and I have no idea why I implemented it like that back when.
Instead of checking for quote_id
, you could compare user
and author
and classify it as a quote when both are different.
"directory": {
"retweet_id" : ["{category}", folder, "archive", "retweets"],
"user != author": ["{category}", folder, "archive", "quotes"],
"" : ["{category}", folder, "archive"],
}
use "locals().get('quote_by')"