internetarchive-downloader
internetarchive-downloader copied to clipboard
[Feature Request] NOT boolean for -f?
I get that's what --invertfilefiltering
does. There are some additional files you want to skip that contain a word you cannot filter out other than filtering everything else. (If I make any sense?) or would that be possible?
Example:
-f "(USA)" NOT "(Beta)"
I get --invertfilefiltering
helps, but for something like what I'm doing, I'll need to filter out a good chunk of countries
Example:
-f "(Demo)" "(Japan)" "(Korea)" "(Europe)" "(Australia)" "(Greece)" "(Germany)" "(Italy)" "(Spain)" "(France)" "(Europe, Australia)" --invertfilefiltering
^Some file names include said words, so I need to leave them in the parenthesis as it is part of a section of a file name that is used for regions.
I'm terrible with coding, so I might not make a lot of sense, but I hope I can help others who are having the same issue as I am.
Thanks for the note, makes perfect sense! Filtering could be improved in a few ways - I'll have a think about this over the weekend and likely add a few additional options.
Hi thanks for getting back to me. I didn't see that you replied. Hardly use git on my end, so it's all new to me. I look forward to any and all improvements. Keep up the great work on it : D
On Thu, May 26, 2022, 12:12 AM john-corcoran @.***> wrote:
Thanks for the note, makes perfect sense! Filtering could be improved in a few ways - I'll have a think about this over the weekend and likely add a few additional options.
— Reply to this email directly, view it on GitHub https://github.com/john-corcoran/internetarchive-downloader/issues/7#issuecomment-1138236755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGW35IWDHGNN3AMPEDMUPVDVL4P4HANCNFSM5W7ZOSBA . You are receiving this because you authored the thread.Message ID: @.***>
You are saying you want filenames that match: "(USA)" NOT "(Beta)" and "([not matching 'USA'])" AND "(Beta)"
One way to go about this is to have IA downloader create a line-delimited list of files that an item has and put it in a text file. Next, use vim or something to modify the text file :g/usa.*beta/di | :g/beta.*usa/di
. Next use that modified TXT file as what IA downloader should download.
Implementing complex string matching could lead to IA downloader being a complex mess of code that does regular expression (regex) matching and whatever. Well, that is one way to look at it: that it should be separated to some other program. Maybe it would be good that it had a filename pattern matching thing that would match via regex as seen in sed and perl in GNU/Linux. The regex could be specified in a text file (like what grab-site does) for better compatibility across Linux, Windows, etc.
The whole "(USA)" NOT "(Beta)"
string format is probably weak, so use regex instead. Regular expression is pretty much all you need when it comes to matching patterns of text. Regex for everything not usa.*beta (not implemented as of now): --invertfilefiltering -f file "filter.txt"
; contents of file.txt:
/usa.*beta/gi
/beta.*usa/gi
Notice '-f file "filter.txt"' for a file with regex and '-f "pattern"' for the pattern directly
Correction: "contents of filter.txt"
Also, I don't think this downloader can download metadata in/at https://catalogd.archive.org/history/[item_id] (login required). If it did it should download it to folder "itemid~history
". The tilde character (~) is not allowed in item IA IDs.