gallery-dl Can't set retweets to download to a separate folder

Can't set retweets to download to a separate folder

Open ExeArco opened this issue 3 years ago • 26 comments

According to the following issues on github this feature should be available. #https://github.com/mikf/gallery-dl/issues/1421 #https://github.com/mikf/gallery-dl/issues/1334

I am currently on version 1.17.2.

config.set(("extractor", "twitter"), "directory", ["twitter","{author[name]}","archive"])
config.set(("extractor", "twitter"), "retweets", ("directory", ["Twitter", "{user_likes}", "Likes"]))

I have tried a variety of things to set for "retweets" but nothing changes unless I set it to False. Then it doesn't download retweets, as should be expected. When I feed gallery-dl a url that is a users twitter page I want to download the users tweets in a base folder then their retweets in a separate folder within their folder. For Example:

targetUser targetUser/retweets

That is what I want.

Apr 19 '21 07:04 ExeArco

I "fixed" this by downloading to a directory further in with retweets enabled, with a script that follows the downloaded images out if they match the target username. Not an elegant solution, but I'm writing this here for anyone wondering the same thing.

Apr 29 '21 12:04 ExeArco

I thought it might've been possible to do this with "{'.' if retweet_id==0 else 'Retweets'}" but the function that handles this doesn't actually use fstrings, probably to avoid arbitrary code execution problems

Sorry to bug you, @mikf, but you got any ideas?

Jun 12 '21 01:06 Scripter17

I thought it might've been possible to do this with "{'.' if retweet_id==0 else 'Retweets'}" but [https://github.com/mikf/gallery-dl/blob/d09bc5bd3462b75a784c8406c549e1c1858f9852/gallery_dl/util.py#L599) doesn't actually use fstrings, probably to avoid arbitrary code execution problems

Sorry to bug you, @mikf, but you got any ideas?

I suspect it's more cuz f-strings are executed in runtime (can't be stored, then executed later) while str.format() can use templates saved into a string like the {author[name]}.

Plus it's faster and way more powerful than the old printf-style percentage format still used by youtube-dl for compatibility, something that couldn't do something like {author[name]}. While the f-strings are even faster and more powerful, which really should be used when templates aren't needed.

Jun 12 '21 01:06 rautamiekka

"{retweet_id:?//L0/Retweets/}" seems to work. It produces an empty string when retweet_id is 0 and Retweets otherwise.

f-strings are Python 3.6+ only and I kind of want to keep Python 3.4 compatibility for gallery-dl v1.x.

Jun 12 '21 13:06 mikf

"{retweet_id:?//L0/Retweets/}" seems to work.

Care to explain how on earth that works and how you managed to come up with it? I am very confused

This feels like it should be documented somewhere

Jun 12 '21 13:06 Scripter17

?// returns an empty string when its input (retweet_id) evaluates to False (e.g. is 0) and otherwise returns its input as a string. It also stops any further processing.

L0/Retweets/ returns its input if its len() is <= 0 and Retweets otherwise.

So if retweet_id == 0 -> empty string through ?//, otherwise Retweets through L0/…/

This feels like it should be documented somewhere

I know, and it kind of is here. The whole string formatting system needs to be redone with a proper parser and all that at some point, and I've been delaying writing any docs until that is done ...

Jun 12 '21 14:06 mikf

config.set(("extractor", "twitter"), "retweets", ("directory", ["Twitter", "{user_likes}", "Likes"]))

I think this cannot possibly work, by the way. Because you can only set the "retweets" option for Twitter to a boolean value (or to the "original" special value).

When I feed gallery-dl a url that is a users twitter page I want to download the users tweets in a base folder then their retweets in a separate folder within their folder. For Example:

targetUser targetUser/retweets

That is what I want.

The good news: This should be possible now.

Look here: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractordirectory

The "directory" option can - analogous to the "filename" option - be set to an object containing Python expression mappings. So it should be possible to do something like this (in the Twitter section of your config file):

"directory": {
    "retweet_id != 0"    : ["Twitter", "{user[name]}", "Retweets"],
    ""                   : ["Twitter", "{user[name]}"]
}

@ExeArco Could you please try this?

Jun 27 '21 07:06 Hrxn

config.set(
	("extractor", "twitter"), 
	"directory", [
			{
				"retweet_id != 0" : ["{category}", folder,"archive", "retweets"],
				""	: ["{category}", folder,"archive"]
			}
		]
	)

This is what I attempted to use, however it gives me the following error: [twitter][error] DirectoryFormatError: Applying directory format string failed (TypeError: expected str, got dict)

And just to confirm, I am on 1.18.0.

Jun 29 '21 09:06 ExeArco

It was implemented one commit after 1.18.0
You'll want to overwrite C:/Python39/Lib/site-packages/gallery_dl/util.py with this version of util.py

Jun 29 '21 11:06 Scripter17

Alright I updated to 1.18.1-dev and I get this error instead [twitter][error] DirectoryFormatError: Applying directory format string failed (TypeError: unhashable type: 'dict')

Jun 29 '21 12:06 ExeArco

Your directory value is a list with a dict as element. It should just be a dict:

config.set(
    ("extractor", "twitter"), 
    "directory", {
        "retweet_id != 0" : ["{category}", folder, "archive", "retweets"],
        "" : ["{category}", folder, "archive"],
    },
)

Jun 29 '21 17:06 mikf

Alright well I fixed that but now it doesn't seem to be separating them at all, it seems that it doesn't ever trigger the retweet != 0 condition. Here are all the config.sets related to twitter before I run that downloadJob.

config.set(("extractor", "twitter"), "filename", "{user[name]}_{tweet_id}_{date}_{num}.{extension}")
config.set(("extractor", "twitter"), "quoted", True)
config.set(("extractor", "twitter"), "text-tweets", True)
config.set(("extractor", "twitter"), "retweets", "original") 

config.set(
	("extractor","twitter"),
	'postprocessors', [
            {
                "name": "metadata",
                "event": "post",
                "filename": "{user[name]}_{tweet_id}_{date}_{num}.data.json"
            }
        ]
    )

config.set(
	("extractor", "twitter"), 
	"directory", {
				"retweet_id != 0" : ["{category}", folder,"archive", "retweets"],
				""	: ["{category}", folder,"archive"]
	},
)

Jun 30 '21 03:06 ExeArco

It is not really possible to differentiate between Tweets and Retweets when setting retweets to "original". It replaces each Retweet entry with its original Tweet, which has a retweeted_status_id_str/retweet_id value of 0. Since you are using Python, you could change the condition before each new Twitter URL to check if the author['name'] field matches the expected username and determine if it's a Retweet like that. Or gallery-dl could update the retweet_id value in such cases, which shouldn't break anything else.

Jul 01 '21 16:07 mikf

I'm using this to archive the user profile themselves, not passing through individual retweet/tweet/status links, so I can't check each URL itself. It would be a much more elegant solution if I could just do it all in one go.

Jul 02 '21 07:07 ExeArco

What's the specific reason for not using extractor.twitter.retweets = true?

Jul 02 '21 12:07 Hrxn

If I set retweets = true it sets the username of the retweet to the target of my download, not the original poster of what is being retweeted. EX: I want to archive posts done by A. If I set retweets to true, my sub folder with all the retweets will be full of content from B,C,D,E,F,G however they will all have the file name set as if A actually tweeted them.

Jul 02 '21 14:07 ExeArco

You can use retweets: true and replace the {user[name]} in filename with {author[name]}

Jul 02 '21 21:07 Scripter17

Yes, that is exactly what I am using here.

Jul 03 '21 00:07 Hrxn

If I do as Scripter17 has suggested it works however it now doesn't sort out quoted retweets., and looking through the metadata json that I have, it appears there is not anything I can use to sort that out, unless I am missing anything.

Jul 03 '21 02:07 ExeArco

quoted retweets

Do you mean regular quote tweets or quotes of a retweet? I don't think I've ever seen the latter, but maybe that exists as well.

it appears there is not anything I can use to sort that out

Quoted tweets have a non-zero quote_id, so it'd be something like

"directory": {
    "retweet_id": ["{category}", folder, "archive", "retweets"],
    "quote_id"  : ["{category}", folder, "archive", "quotes"],
    ""	        : ["{category}", folder, "archive"],
}

I've also updated the behavior of "retweets"; "original" to have a non-zero value for retweet_id (https://github.com/mikf/gallery-dl/commit/414bdc95a342c333c646b273d81bf74304475c53)

Jul 03 '21 20:07 mikf

I'll be honest I'm not sure exactly what this is called either, but take this example. https://twitter.com/BarackObama This is the link I send to download, later down Obama quotes a NetflixFilm account(https://twitter.com/BarackObama/status/1408818108224131074), and I would like that put in retweets. Looking through the metadata, it appears quote_id, reply_id, and retweet_id are all 0 so I am not sure how to separate this one out.

Jul 04 '21 01:07 ExeArco

I am still looking for a fix to this, does anyone have one? I have just updated to the latest version and there still seems to be no way to properly separate those tweets out.

Aug 15 '21 08:08 ExeArco

[..] it appears quote_id, reply_id, and retweet_id are all 0 so I am not sure how to separate this one out.

Not sure, but if this is what gets returned by Twitter..

PS D:\> gallery-dl --ignore-config -K 'https://twitter.com/BarackObama/status/1408818108224131074' --option '"quoted"=true' | sls "tweet|quote|reply" -NoEmphasis -Context 0,1

> quote_count
    439
> quote_id
    0
> reply_count
    268
> reply_id
    0
> retweet_count
    792
> retweet_id
    0
>   tweet
> tweet_id
    1408510014847959053
> quote_count
    439
> quote_id
    0
> reply_count
    268
> reply_id
    0
> retweet_count
    792
> retweet_id
    0
>   tweet
> tweet_id
    1408510014847959053

PS D:\>

This tweet seems a bit strange.. If I don't use -o '"quoted"=true' gallery-dl skips this tweet with default settings ("skipping quoted tweet")..

So, something is a bit off here, I guess. I blame Twitter.

Aug 15 '21 15:08 Hrxn

I mean it is a quoted tweet right? So then why is quote_id 0? I really would like to separate these out through gallery-dl

Aug 19 '21 09:08 ExeArco

As it turns out, quote_id is the ID of the quoted Tweet and is only nonzero for Tweets that contain a quote, not for the Tweet being quoted. It is the exact opposite of how retweets behave and I have no idea why I implemented it like that back when.

Instead of checking for quote_id, you could compare user and author and classify it as a quote when both are different.

"directory": {
    "retweet_id"    : ["{category}", folder, "archive", "retweets"],
    "user != author": ["{category}", folder, "archive", "quotes"],
    ""	            : ["{category}", folder, "archive"],
}

Aug 23 '21 17:08 mikf

use "locals().get('quote_by')"

Jul 11 '22 01:07 afterdelight

gallery-dl gallery-dl copied to clipboard

Can't set retweets to download to a separate folder

gallery-dl
gallery-dl copied to clipboard