tiled icon indicating copy to clipboard operation
tiled copied to clipboard

Expose a good interface for ignoring files

Open padraic-shafer opened this issue 2 years ago • 11 comments

When ignore_re_files is passed as an argument to DirectoryAdapter.from_directory(), files matching the RegEx pattern are the only files that get served.

From the name of the parameter and its documentation, I would have expected the opposite behavior--namely that these files would be ignored and not served.

https://github.com/bluesky/tiled/blob/52bede5ea1768b83807556ec7cae8bc147535dc4/tiled/adapters/files.py#L141-L146

padraic-shafer avatar Jan 24 '23 20:01 padraic-shafer

Huh. I wonder how that went backward. I agree that this is clearly a bug.

danielballan avatar Jan 30 '23 21:01 danielballan

Here's my best guess so far.. (I haven't had a chance to test any of this. Documenting it here for whenever someone has time.)

https://github.com/bluesky/tiled/blob/7c6a8dd06d5eb23fac8402a68aedc467983831f2/tiled/adapters/files.py#L287-L318

  • Should line 290 have a "not"? if (ignore_re_dirs is None) or not compiled_ignore_re_dirs.match(
  • ...same for line 308? if (ignore_re_files is None) or not compiled_ignore_re_files.match(
  • In that case, lines 314-317 should no longer be needed. In any case, it looks like the file pattern is being compared to the directory name here.

On the other hand, watchgod is only keeping tabs on the files/directories that are matching the ignore parameters; so that aspect would need modifications as well.

https://github.com/bluesky/tiled/blob/7c6a8dd06d5eb23fac8402a68aedc467983831f2/tiled/adapters/files.py#L454

Here is the relevant code from watchgod.watcher.RegExpWatcher 0.8.2

class RegExpWatcher(AllWatcher):
    def __init__(self, root_path: Union[str, Path], re_files: Optional[str] = None, re_dirs: Optional[str] = None):
        self.re_files: Optional[Pattern[str]] = re.compile(re_files) if re_files is not None else re_files
        self.re_dirs: Optional[Pattern[str]] = re.compile(re_dirs) if re_dirs is not None else re_dirs
        super().__init__(root_path)

    def should_watch_file(self, entry: 'DirEntry') -> bool:
        if self.re_files is not None:
            return bool(self.re_files.match(entry.path))
        else:
            return super().should_watch_file(entry)

    def should_watch_dir(self, entry: 'DirEntry') -> bool:
        if self.re_dirs is not None:
            return bool(self.re_dirs.match(entry.path))
        else:
            return super().should_watch_dir(entry)

padraic-shafer avatar Feb 02 '23 14:02 padraic-shafer

From my point of view, specifying which files/directories to match is just as valid as which ones to ignore. So the simple fix might be to just rename the parameter to include_re_*.

padraic-shafer avatar Feb 02 '23 14:02 padraic-shafer

Here is the relevant code from watchgod.watcher.RegExpWatcher 0.8.2

Btw, at some point will probably need to migrate from watchgod to watchfiles.

https://github.com/bluesky/tiled/blob/23cde094b737987e5ec30d497c87563a5d6dd0d5/requirements-server.txt#L30

padraic-shafer avatar Feb 02 '23 14:02 padraic-shafer

Thanks for digging into this.

I've been vaguely aware (and feeling a bit guilty) that the watchgod-related parameters are under-tested. This is a relic of the early stage of Tiled development where we just wanted to know, "Is this idea even coherent?!" It needs to be revisited with more thought to design.

I think, along with #300, we will need a significant overhaul of the tiled.adapters.files. In the back of my mind, I was thinking we might migrate to wtachfiles as part of that overhaul, and rethink the user-facing parameters as well. Perhaps a simple callable that return a boolean for "include or not" would be best.

danielballan avatar Feb 02 '23 19:02 danielballan

All good :)

A project this size is bound to have some corners to tidy up. On balance, tiled is a fantastic achievement that simplifies finding and accessing datasets.

padraic-shafer avatar Feb 02 '23 20:02 padraic-shafer

As promised #511 completely reworks the file-walker and renders the original issue here moot.

I provides a filter argument that gets a function filter(filepath) -> bool. It uses the same filter for:

  1. The initial crawl of the directory, which does not use watchfiles
  2. Watching the directory, using watchfiles

As a low-level, maximally flexible interface, I think this simple function is correct. But it probably makes sense to expose a glob or regex pattern interface, especially to the CLI, e.g.

tiled serve directory --ignore *.stuff

That is not done in #511. I will re-scope this issue to address that, as some of the considerations raised above may be helpful.

danielballan avatar Jul 19 '23 11:07 danielballan

I started trying to implement this with --include and --exclude options, similar to how they are used in rsync. However, typer does not seem to support preserving the order of these optional parameters when they are interspersed on the command line. That is, it will provide an ordered list of "include" values and a separate ordered list of "exclude" values, but the order of interdigitated "include/exclude" values gets lost.

A workaround might be to:

  • use only --include or only --exclude parameters (which are mutually exclusive--you can't define both in the same command), and
  • allow a "!" prefix in the parameter values which would invert the behavior of the following pattern (i.e., to allow an "exclude" pattern as a valid value for an --include parameter).

Another workaround might use @typer.Typer().callback() to hijack the command line context and manually extract --include and --exclude options in the order they were specified. This seems pretty hacky though, and counterintuitive to using a nice CLI parser.


I'm looking for suggestions.

FWIW, I was thinking of using --include & --exclude for shell-style expansions (?, *, **, ***) and --include-re & --exclude-re for RegEx patterns.

padraic-shafer avatar Aug 12 '23 15:08 padraic-shafer

Another workaround might use @typer.Typer().callback() to hijack the command line context and manually extract --include and --exclude options in the order they were specified. This seems pretty hacky though, and counterintuitive to using a nice CLI parser.

After looking into this a bit more, this approach looks fairly straightforward. Here is a related solution that subclasses the underlying click.Command class to override the parsing behavior.

padraic-shafer avatar Aug 14 '23 13:08 padraic-shafer

I like supporting shell-style and regex-style options, and I like the names proposed.

I'm of two minds about subclassing click.Command. It's good that there is a supported pathway for this, but I would feel more confident that it is worth the added depth-of-integration if we had a couple use cases. I might be inclined to wait until we need it and start with "they're mutually exclusive". OTOH, it is nice to start with a fully complete and correct implementation.

I'm curious what others think here.

danielballan avatar Aug 17 '23 21:08 danielballan

For simplicity we could start with implementing only an "exclude" option. I imagine that most use cases involve exposing all files / objects in the container while hiding only a few exceptions (such as dot files or metadata sidecars).

I'm inclined to start with the RegEx version because using wildcards on the CLI will require protecting them from unintended shell expansion. Wrapping the parameters with single-quotes seems like a tedious, error-prone step that might frequently be overlooked. I'm rethinking whether the shell-style option makes sense here; maybe it should be limited to entries in a configuration file provided to an --exclude-file option?

padraic-shafer avatar Aug 22 '23 00:08 padraic-shafer