internetarchive-downloader icon indicating copy to clipboard operation
internetarchive-downloader copied to clipboard

Add parameter for exact file matching since file filter uses multiple word matching

Open seanwo opened this issue 7 months ago • 0 comments

Love the tool and the robust features for retries and multi-part downloads. I had to alter it to do exact file matching to pull an exact file out of an item library. I think the issue is that the files I was trying to pull had spaces in them.

Example usage:

python3 ia_downloader.py download --verify --resume --split 5 -i ialibraryname -f "filename with multiple spaces.zip"

When you use the file filter parameter it tries to match on very word even if you put it in quotes. Since I just want one file at I had to hack it to do an exact match for the file filter. Maybe it would be nice to have an officially supported command line parameter for exact matching? I would put a pull request in but I am not sure what makes sense for the addition. Ideas are welcome and if I have time I could throw something together. Below is the patch I used to get my job done.

--- ia_downloader.py	2022-04-27 18:55:55
+++ no.timemachine/xbox/ia_downloader.py	2024-07-23 14:24:23
@@ -1070,8 +1070,8 @@
                 cache_file_handler.write(log_write_str)
             if file_filters is not None:
                 if not invert_file_filtering:
-                    if not any(
-                        substring.lower() in file["name"].lower() for substring in file_filters
+                    if not (
+                        file["name"] in file_filters
                     ):
                         continue
                 else:
@@ -1692,7 +1692,7 @@
         "-f",
         "--filefilters",
         type=str,
-        nargs="+",
+        #nargs="+",
         help=(
             "One or more (space separated) file name filters; only files that contain any of the"
             " provided filter strings (case insensitive) will be downloaded. If multiple filters"

seanwo avatar Jul 29 '24 17:07 seanwo