afero icon indicating copy to clipboard operation
afero copied to clipboard

Add new filtering fs: FilePredicateFs

Open satotake opened this issue 3 years ago • 7 comments

While golang's built-in regexp package has the advantage, constant time guarantees, it (RE2) has some limitations to express complex rules compared to other languages (PCRE2). (For example, it does not have backtracking.) Here, I would like to introduce a new filtering backend, PredicateFs, to make afero's filtering more flexible and composable.

PredicateFs is a filtered view on predicates which takes file path as an argument. Any file will be treated as non-existing when the predicate returns false. Unlike RegexpFs, the fs targets not file name but file path. Like, RegexpFs, files will not be created when the predicate returns false and directories are always not filtered.

satotake avatar Jun 19 '21 12:06 satotake

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jun 19 '21 12:06 CLAassistant

This one is pretty nice as it may filter according to any kind of path related rule, not just a regular expression.

You removed a #from the readme of HttpFs

Imo it shouldn't handle directories in any special way. Might as well provide access to the internal Fs in the predicate function in order to e.g. check if the file is a directory or do other checks with the file path. This might need some synchronozation mechanism, I'd guess or one assumes that the Fs is goroutine safe. I wonder what the thoughts behind that reasoning were? Same as RegexpFs?

I could think of a use case where I want my Fs to be only able to interact within specific directories somewhat like BasePathFs but with multiple directories e.g. provided in a regular expression or in a list.

jxsl13 avatar Jan 09 '22 21:01 jxsl13

You removed a #from the readme of HttpFs

This is not related with this request directly but I think HttpFs is not filtering fs.

I wonder what the thoughts behind that reasoning were? Same as RegexpFs?

Good point. Though I am not sure of the rationales of RegexpFs, probably they are the same.

Consider the situation where we

  • want to include only files with .go extension
  • use ~fs.WalkDir~ defined walk function, which calls Open internally

(yes, this will be minor)

If we apply the function with both files and dirs, strings.HasSuffix(s, ".go") will be wrong for files in sub dir. dir/example.go is skipped surprisingly(?) because strings.HasSuffix("dir", ".go") is false. This is easy to go wrong in my opinion.

Instead of it, as you describe, it will be clear to combine PredicateFs with BasepathFs etc. Or, we can handle directories as I wrote the test though it is a little verbose (.hidden).

satotake avatar Jan 10 '22 06:01 satotake

In case that you provide the underlying Fs as parameter to the predicate function, you might be able to check whether the file path is actually a file or a directory with a Stat call and then go for the file's suffix.

^ I think that this would make things way more complicated, for sure and potentially more error prone.

Might as well rename PredicateFs to FilePredicateFs in order to make it clearer that this abstraction is only targeting files.

jxsl13 avatar Jan 10 '22 09:01 jxsl13

@jxsl13 thanks for reviewing. Replaced PredicateFs with FilePredicateFs

satotake avatar Jan 10 '22 16:01 satotake

Sorry for being very late to the game, this looks very useful. I have read the discussion above, but I suspect that we're missing out on some potential when we apply this to files (and not dirs) only. If I walk a directory I certainly want to be able to skip a dir before reading it.

How about making the predicate:

func(isDir bool, path string) bool

bep avatar Jul 14 '22 07:07 bep

@bep thank you. I referred to RegexpFs and it skips all directories internally. So does this fs.

satotake avatar Jul 14 '22 16:07 satotake