gallery-dl icon indicating copy to clipboard operation
gallery-dl copied to clipboard

[twitter] [feature?] Possible to keep retweets and actual tweets by artists in separate folders?

Open a-washing-machine opened this issue 2 years ago • 15 comments

Sometimes you get twitter accounts with ~500 art pieces, and ~5000 retweets of other artists' works.

Both are relevant to me, but I'd still like a distinction between the two.

I'm thinking something like:

C:\gallery-dl\twitter\ARTIST\ C:\gallery-dl\twitter\ARTIST\retweets\

Or possibly:

C:\gallery-dl\twitter\ARTIST\ C:\gallery-dl\twitter\ARTIST\retweets\RETWEETED TWITTER NAME\

Is that something already possible currently, and I simply need to adjust my config-file, or would that be a new feature?

My download queries look like this: gallery-dl_1.22.0.exe --config gallery-dl_config.conf https://twitter.com/ARTIST/media gallery-dl_1.22.0.exe --config gallery-dl_config.conf https://twitter.com/ARTIST

(I use both formats in my download queries.)

My twitter config:

"twitter":
    {
		"username": "[REDACTED]",
		"password": "[REDACTED]",
		"cookies": "twitter.com_cookies.txt",
		"cookies-update": true,
		"retweets": true,
		"quoted": true,
		"replies": true,
		"text-tweets": true
    }

(By the way, ''"text-tweets": true'' seems to not do anything? I'd assume it'd download text-tweets as text-files.)

a-washing-machine avatar Jun 07 '22 15:06 a-washing-machine

This is already possible using "conditional" directory format strings. Something like the following puts retweets and quoted tweets into their own sub-directories:

            "directory": {
                "retweet_id"              : ["{category}", "{user[name]}", "Retweets", "{author[name]}"],
                "locals().get('quote_by')": ["{category}", "{user[name]}", "Quoted"  , "{author[name]}"],
                ""                        : ["{category}", "{user[name]}"]
            }

"text-tweets": true only effects the emitted metadata, for which you have to use a post processor to write it to a file, for example

    "postprocessors": [
         {
            "name": "metadata",
            "event": "post",
            "filename": "{tweet_id}.json"
         }
    ]

(See issue #570)


"username": "[REDACTED]", "password": "[REDACTED]", "cookies": "twitter.com_cookies.txt", "cookies-update": true,

You only need either username & password or cookies. If twitter.com_cookies.txt contains an auth_token cookie, your username & password settings get ignored.

mikf avatar Jun 07 '22 18:06 mikf

thanks, this is what i needed too!

afterdelight avatar Jun 08 '22 05:06 afterdelight

That works, thanks for the quick reply! :)

A follow-up question regarding the .json file metadata;

I) I'm guessing there's nothing already built to just extract the tweet's text? The rest of the meta data is just clutter for me. NOT an issue that needs solving, I can build something myself to extract the data locally from the file after the fact, just asking if a solution already exists.

II) For the sake of keeping easier overview of image/video files in the twitter directories, I'd like to know if it is currently possible to keep the .json files separate from the image/video files, either by:

IIa) ...downloading the .json files into a subdirectory or entirely separate directory? ( I.e. C:\gallery-dl\twitter\ARTIST\JSON\, or C:\gallery-dl\twitter\ARTIST_JSON\, or C:\gallery-dl\twitter\JSON\ARTIST\ )

IIb) ...downloading ONLY the .json files, NOT any images/videos, in which case I'd just make a separate gallery-dl directory for just that. (I.e. C:\gallery-dl\twitter\ARTIST\ and C:\gallery-dl_JSON_ONLY\twitter\ARTIST\ )

If neither solution is currently possible, I'll just think of some workaround myself, it'll just be a minor nuisance I suppose. Just figured I'd ask in advance before I begin mass-downloading those .json files. ;-)

a-washing-machine avatar Jun 08 '22 15:06 a-washing-machine

I) just extract the tweet's text?

You can control what gets extracted by setting mode to "custom" and setting a content-format format string:

    "postprocessors": [
         {
            "name": "metadata",
            "event": "post",
            "filename": "{tweet_id}.txt",
            "mode": "custom",
            "content-format": "{content}"
         }
    ]

(the text content of a tweet is stored in content)

IIa) ...downloading the .json files into a subdirectory or entirely separate directory?

Can be done by setting a directory for the metadata post processor. Keep in mind that this value will only be interpreted as a static string with environment variable support. It is not a fancy format string like the regular directory value is.

Storing all .json files in C:\gallery-dl\twitter\ARTIST\JSON would be done by adding "directory": "JSON" to the post processor above.

IIb) ...downloading ONLY the .json files, NOT any images/videos

Can be done with --no-download or the download config option.


All configuration file options and post processor options can be found in docs/configuration.rst with some usage examples in docs/gallery-dl-example.conf

mikf avatar Jun 08 '22 18:06 mikf

Oh cool, thanks! I'm gonna try that and come back in case I mess it up somehow. ^_^#

a-washing-machine avatar Jun 08 '22 18:06 a-washing-machine

sorry off topic but anyway to display only the tweets numbers, retweets number and total media number from -K command?

afterdelight avatar Jun 09 '22 09:06 afterdelight

Alright, everything works, I'm ready to close the topic from my end. :)

sorry off topic but anyway to display only the tweets numbers, retweets number and total media number from -K command?

I messed around with that a bit, not completely sure what you meant, but did you want something like this?

	"twitter":
    {
		"username": "[REDACTED]",
		"password": "[REDACTED]",
		"cookies": "twitter.com_cookies.txt",
		"cookies-update": true,
		"retweets": true,
		"quoted": true,
		"replies": true,
		"text-tweets": true,
		
		"directory": {
		"retweet_id"              : ["{category}", "{user[name]}", "Retweets", "{author[name]}"],
		"locals().get('quote_by')": ["{category}", "{user[name]}", "Quoted"  , "{author[name]}"],
		""                        : ["{category}", "{user[name]}"]
        },

		"postprocessors": [
			 {
				"name": "metadata",
				"event": "post",
				"filename": "{tweet_id}.filetypeformatofyourchoicehere.txt",
				"mode": "custom",
				"content-format": "{content} {tweet_id} {retweet_id} {author[media_count]} {author[statuses_count]}",
				"directory": "WHATEVER"
			 }
		]

		
    },

"content-format": "{content} {tweet_id} {retweet_id} {author[media_count]} {author[statuses_count]}" adds the text, tweet id and retweet id, and then "media count" and "statuses count" (dunno if that's what you wanted), and looks something like this:

blablabla 1234567890123456789 9876543210987654321 6 53 (Be aware that "blablabla" can also have blankspaces and linebreaks.)

...and writes it into the file path C:\\gallery-dl\\twitter\\ARTIST\\Retweets\\WHATEVER\\1234567890123456789.filetypeformatofyourchoicehere.txt

Is this vaguely in the direction of what you wanted? Yay I'm helping...? ^_^#

a-washing-machine avatar Jun 25 '22 15:06 a-washing-machine

Updated the above comment to include "media count" and "status count" (not sure if that is what you wanted).

I got all those parameters from the JSON file it downloads when you use the config info described in mikf's first reply on this topic.

a-washing-machine avatar Jun 25 '22 19:06 a-washing-machine

no but its close enough. all i wanted is to see how much a user total tweets number, total retweets number and total media numbers displayed in a command prompt or printed in a file.txt

afterdelight avatar Jun 27 '22 20:06 afterdelight

@afterdelight

no but its close enough

Then I'm gonna close the issue. Or was there something else you needed?

Do you want to get the information only once as an overview, or is it supposed to auto-update itself?

I think it should be possible to set it up so that it'd get the "media count / statuses count" to only be written to a single file per account, once, and then stop and do the next account (this would skip information of retweets from other accounts of course) - each with some easily searchable/extractable filename. And also to update them. Though this probably would have to be a separate command line call from the main gallery download because of the "get one result and then stop" bit.

I think that should be possible, though I'd have to look into it further. Is that something relevant to you?

(And if you need everything combined in one file for overview, this may be done via command line, maybe something like for /R %f in (*.txt) do type "%f" >> c:\Test\output.txt )

a-washing-machine avatar Aug 18 '22 13:08 a-washing-machine

@afterdelight

no but its close enough

Then I'm gonna close the issue. Or was there something else you needed?

Do you want to get the information only once as an overview, or is it supposed to auto-update itself?

I think it should be possible to set it up so that it'd get the "media count / statuses count" to only be written to a single file per account, once, and then stop and do the next account (this would skip information of retweets from other accounts of course) - each with some easily searchable/extractable filename. And also to update them. Though this probably would have to be a separate command line call from the main gallery download because of the "get one result and then stop" bit.

I think that should be possible, though I'd have to look into it further. Is that something relevant to you?

(And if you need everything combined in one file for overview, this may be done via command line, maybe something like for /R %f in (*.txt) do type "%f" >> c:\Test\output.txt )

'"media count / statuses count" to only be written to a single file per account, once, and then stop and do the next account'

Yes, this is what i want!! How to do that?? Sorry for long reply.

afterdelight avatar Sep 15 '22 17:09 afterdelight

With a metadata post processor.

    "postprocessors": [
        {
            "name": "metadata",
            "event": "init",
            "filename": "{user[name]}.txt",
            "mode": "custom",
            "format": "{user[media_count]}\n{user[count]}\n"
        }
    ]

You might also want to add "image-range": "0" to "twitter" or use --range 0 to stop before the first media file download.


Update to the initial question: Since v1.23.0 it is possible to replace the rather lengthy "locals().get('quote_by')" condition from https://github.com/mikf/gallery-dl/issues/2663#issuecomment-1149051520 with just "quote_id".

mikf avatar Sep 16 '22 08:09 mikf

thank, it worked! this is my config:

"postprocessors": [{
		"directory"		: "",
		"name"			: "metadata",
		"event"			: "init",
		"filename"		: "{user[name]}_{date:%Y%m%d}.info.txt",
		"mode"			: "custom",
		"content-format": "Nick: {author[nick]}\nAccount Created: {author[date]}\nLocation: {author[location]}\nUrl: {author[url]}\nTotal Tweets: {author[statuses_count]}\nTotal Medias: {author[media_count]}\nTotal Retweet: {retweet_count}\nTotal Quote: {quote_count}\nTotal Reply: {reply_count}\nProfile Banner: {author[profile_banner]}\nProfile Picture: {author[profile_image]}"
		}]

afterdelight avatar Sep 17 '22 02:09 afterdelight

You should not use author[…] in your content-format string. If the first tweet is a retweet/quote, this will write data for the wrong user. Just keep everything as user[…].

mikf avatar Sep 17 '22 12:09 mikf

oh right, thanks for the input. i have corrected my mistakes.

afterdelight avatar Sep 18 '22 05:09 afterdelight