dirhash-python icon indicating copy to clipboard operation
dirhash-python copied to clipboard

Ignore and Empty Dir Interaction Seems Counterintuitive

Open matgrioni opened this issue 1 year ago • 2 comments

In using this library, the interaction between the ignore patters and empty dir seems counterintuitive and does not produce the results I expect (although it does seem consistent with the standard).

Essentially if using empty_dir=True, but also ignoring an entire directory (for example "venv/"), when looking at dirhash.included_paths, this combination of parameters will include "venv/." in the list of paths to hash. This is usually not what I think is desired, because it means that the presence of venv in this case will change the hash result even though I wanted to ignore it.

The point of the ignore command is to not have that path impact results regardless of content, existence, emptyness or any other variable. The hash should not consider it. But apparently, empty_dir can bring the ignored path back into consideration. I would consider empty_dir to be relevant for paths which are empty, but have not been explicitly ignored.

I think the current behavior is consistent with the standard:

A directory is considered empty if it contains no files or directories to include given the Filtering Options.

However, a small change would allow that

A directory is considered empty if it matches against the provided patterns and contains no files or directories to include given the Filtering Options.

matgrioni avatar Sep 04 '22 21:09 matgrioni

Seem like a very relevant point, and counterintuitive yes. The whole match/ignore logic (especially for directories) is certainly the most challenging part and the "weakest"/least well-defined in the protocol, as I recall it.

andhus avatar Sep 05 '22 12:09 andhus

See also https://github.com/andhus/dirhash-python/issues/8

andhus avatar Sep 05 '22 12:09 andhus