syncMyMoodle icon indicating copy to clipboard operation
syncMyMoodle copied to clipboard

Add file name exclusion using UNIX filename pattern matching

Open tyalie opened this issue 3 years ago • 6 comments

Exclude files using a combination of fnmatch and braceexpand in order to give the full BASH filepattern matching experience.

For example it is now possible with the exclude_files parameter in the config.json to exclude all files like the following by simply using Lecture Videos_{zoom,video}*.mp4. Whcih makes sense in my case for example where lecture videos are uploaded with a semantic name in an extra folder called Video Download

Lecture Videos_zoom_0_NWFjN2RlMD.mp4
Lecture Videos_video1680865997.mp4
Lecture Videos_video1119115261.mp4
Lecture Videos_video1230373776.mp4
Lecture Videos_video1599374246.mp4

In other words it is a more powerful version of the existing exclude_filetypes config parameter.

Current limitations are:

  • [ ] it's not possible to supply exclude_files using the command line arguments
  • [ ] You can only give a pattern for the filename, not the full path (i.e. exclusions of whole subfolders)
  • [ ] added a new dependency bracketexpand in order to give the BASH file pattern matching experience
    • UNIX file patterns alone don't define the {a,b} expansion
  • [ ] exclude_filetypes still exists for compatibility reasons, even though it could be completely replaced by exclude_files
    • exclude_filetypes could be merged into exclude_files during the config loading phase, removing https://github.com/Romern/syncMyMoodle/blob/e928bab71223b4ec176fd0c3c9f0574056be23cd/syncmymoodle/main.py#L765-L768

tyalie avatar Jan 11 '22 15:01 tyalie

I would prefer a filter with using fnmatch or re module instead of adding a new dependency. Regexes also allow for quite powerful matching capabilities.

septatrix avatar Jan 13 '22 17:01 septatrix

I get that. Sadly fnmatch is a bit limiting due to no bracket expansion, which could lead to a lot of duplication in the exclude_files. (Or would have in my case)

I also thought about Regex expression, but they are far more complex to use then BASH expressions and not user friendly for something like filtering files. I think the Syntax is also harder to learn than BASH file matching. (And yeah. Regex is probably even more powerful than BASH file matching. But still)

But tbh the dependency is quite small and the code could just be copied into it's own function inside the main or an extra file. (Less then 150 lines)

tyalie avatar Jan 13 '22 17:01 tyalie

I think 95% of the use cases can be solved with fnmatch and unless one performs 3 brace expansions it is also trivial to copy and paste the pattern. I say let us go with pure fnmatch and if that turns out to be insufficient we can always expand the functionality in a backwards compatible manner.

septatrix avatar Jan 14 '22 15:01 septatrix

@arandomliz could you share your filters, maybe it helps to understand why we need fnmatch, and some might be useful as defaults.

Romern avatar Jan 14 '22 15:01 Romern

I currently use this filter right now in order to exclude lecture videos that are also in the Lecture Video Download directory with a much clearer and descriptive name:

"exclude_files": ["Lecture Videos_{video,zoom,untitled}*.mp4"]

Using only fnmatch would result in:

"exclude_files": ["Lecture Videos_videos*.mp4", "Lecture Videos_zoom*.mp4", "Lecture Videos_untitled*.mp4"]

Luckily they don't switch here between file types (yet).

tyalie avatar Jan 14 '22 18:01 tyalie

I currently use this filter right now in order to exclude lecture videos that are also in the Lecture Video Download directory with a much clearer and descriptive name:

"exclude_files": ["Lecture Videos_{video,zoom,untitled}*.mp4"]

Why not simply Lecture Videos_*.mp4? Also regarding your case I think a better approach would be to implement blocking of a specific section or module.

septatrix avatar Jan 14 '22 23:01 septatrix