umd icon indicating copy to clipboard operation
umd copied to clipboard

Extension filter isn't honored for previously downloaded files

Open SpaceInvaderZim opened this issue 2 months ago • 4 comments

As the title suggests, there may be an issue in the way the extension filter works in collaboration with the cache file. I write 'may' as in, depending on how you expect it to work, this may be seen as intended functionality, or a bug.

The scenario is as follows:

  1. Download multiple media files from a hosting site, in this case a subreddit on Reddit - but DON'T specify an extension filter.
  2. Now delete all the unwanted files in the target folder, in my case I deleted all gif and mp4 files (don't delete the cache file).
  3. Now repeat step 1, but this time make sure to specify an extension filter, in my case I added "-e jpg,jpeg,png,webp" to my commandline.

The application will now download the latest content from the subreddit, but it doesn't respect the extension filter. E.g. this time around it will re-download all the gif and mp4 files that was manually deleted, even though the new extension filter should keep these files from being "restored".

SpaceInvaderZim avatar Oct 23 '25 15:10 SpaceInvaderZim

Hi @SpaceInvaderZim! I'm Dosu and I’m helping the umd team.

The extension filter is only applied when fetching new media from the source. If a cache file exists, the app uses the cached list of media files for downloads, regardless of the extension filter you specify in subsequent runs. This means files with extensions outside your filter (like gif and mp4) will be re-downloaded if they're present in the cache, even if you deleted them from the folder and set a filter for jpg, jpeg, png, and webp. The filter does not get re-applied to cached results—only to fresh queries—so the cache takes precedence over the filter in this scenario. This behavior is by design in the current implementation, but it can be confusing if you expect the filter to always be respected. For more details, see the relevant CLI and cache logic here and here.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Join Discord Share on X

dosubot[bot] avatar Oct 23 '25 15:10 dosubot[bot]

@SpaceInvaderZim It's basically what the bot said. The behavior of cached results having precedence over extension filters is intended.

One thing you could do to bypass this behavior is to use the --no-cache parameter when you use the CLI. This way the app will ignore the cached results and respect extension filters. Like in this example:

$ umd-dl -d <directory> -e jpg,jpeg --no-cache <url>

But now that I'm thinking a little more about this feature, maybe it's confusing that cached results have precedence over filters or other flags. I think a better approach would be to have multiple cache levels depending on the parameters set by the user, but I need to think this through and see what would make sense.

In any case this is something that I won't have time to work any time soon, so I suggest to use the --no-cache parameter for now.

vegidio avatar Oct 23 '25 16:10 vegidio

@vegidio Thanks for taking your time to answer, again. I have to admit that I hadn't looked at how the cache was implemented until after the bot answered. Albeit GO isn't a language I've dabbled in before, I can see that I have misunderstood the way the cache works!

For me personally, I would assume that the cache has a lower priority than the filters, but I may be colored from my own line of work.

Thanks for clearing things up for me - I'll now show myself out :)

SpaceInvaderZim avatar Oct 23 '25 17:10 SpaceInvaderZim

No worries. I added this enhancement to change the behavior.

As I said, I don't have time to look at this now, but I will probably work on it around January.

vegidio avatar Oct 24 '25 20:10 vegidio