gitignore.rs icon indicating copy to clipboard operation
gitignore.rs copied to clipboard

Why does this recurse through all directories from the gitignore location to check if one file is ignored or not?

Open btipling opened this issue 8 years ago • 3 comments

I've been going through the source code on this project trying to understand how it works. It looks like every time a path is given to this library to check to see if it's ignored or not this library actually walks the entire directory tree recursively (although via a loop), and takes every single file found that's not excluded, and puts that into a list and then checks if the given path is not in that list.

What if the given file doesn't exist? What if the library user just wanted to know if a hypothetical file matches any of the rules?

Why wouldn't it have been enough to simply make file_is_excluded public and not even write is_excluded or included_files? Wouldn't file_is_excluded answer the question of whether or not the given path was excluded or not? It takes the found file path and matches it against the rules, why would it be necessary to iterate through all the paths?

Maybe I'm not following the code well, but based on reading through it I came to wonder these things. Imagine wanting to see if a single file in the intellij source code was ignored or not. This is a very large project. You'd have to parse thousands of directories and add tens of thousands of files to your list and then match to see if that single file was not found in that gigantic list. Now imagine you wanted to do this for many files. Each time you'd have to build that list again and again. Sorry if I'm confused about anything.

btipling avatar Jan 22 '17 02:01 btipling

It looks like you originally had this code base working by just checking the file path against the found rules until 4b2489115f1e37c4fd01988f974093c8ecd34dcc. You have the following comment for that commit:

The is_excluded method was implemented without the checks for parent directories excluding the file. This method how now been implemented in terms of included_files which solves the problem, test coverage has been added.

I'd think that you wouldn't need to grab all files for the entire repository to check if the path because a parent directory was ignored. Wouldn't you just have to run the passed in path one bit at a time through existing rules? I also note that you're not picking up additional .gitignore files in the repository. You can actually have multiple .gitignore files in a repository and the rules for these nested ignore files are true for only the files and directories in that nested directory where the .gitignore exists.

I'm sorry if I'm mistaken in any of this. It's entirely likely I just totally got it wrong reading through this source code.

btipling avatar Jan 22 '17 02:01 btipling

@btipling Hey! Thanks for taking the time to write such detailed comments and review the source-code - open-source is great!

The code as it stands doesn't actually work for multi-gitignore file repositories, but it's something I intend to do. The repo kind of evolved from something that scratched an itch for an external project (namely getting a recursive list of the files not ignored by a single root ignore rule file) - but from there became an actual gitignore checking library sort of accidently. As such, I strongly think the entire thing needs redoing from scratch to:

  1. Handle multiple .gitignore files at all.
  2. Have speed as a primary concern.
  3. Make the library as close to zero-copy as possible.

I've been working on the above a little bit in my spare time, inspired by @BurntSushi's ignore project. I plan to try to reimplement gitignore using RegexSet for the rules and avoid doing any FS operations at all (you could implement it so you only check if the file actually exists as the very last step if at all as you rightly point out above).

I hope to be able to finish cobbling something together on this ASAP, and will endeavour to push my branch shortly so perhaps you could have a look?

Anyway, hope this explains how I ended up here a bit. Thanks again for such a great comment, I really appreciate you taking the time to read through the source properly! 👍

nathankleyn avatar Feb 13 '17 08:02 nathankleyn

No problem, I learned a lot about .gitignore in the process. I personally decided to only have partial gitignore support in my project as it seems pretty difficult to have performant behavior with gitignore. At least within the time frame I was willing to dedicate to the task. Looking forward to your changes!

btipling avatar Feb 15 '17 00:02 btipling

man I really wish you had fixed this. I spent a lot of time working with this library only to realize it was dog slow taking between 5 to 30 seconds to tell me if a directory was excluded or not.

jmsunseri avatar Mar 20 '23 10:03 jmsunseri

Hi everybody. After much consideration I've decided I'm going to archive this crate, about which you can read more here:

https://github.com/nathankleyn/gitignore.rs#project-status

It's a tough decision to make but I want to be realistic that the crate isn't in a good state — it needs a total overhaul to fix the kinds of issues that are being reported, but I haven't the time to commit to it. I can't reasonably handover what is here to a new maintainer, so instead I have made some recommendations in the README for crates you should consider using instead for ignoring files. Please let me know if you need assistance with migrating should you be using this crate.

I'm really sorry I never got back here, hope you can understand though and that the README suggestions are useful.

nathankleyn avatar Apr 19 '23 13:04 nathankleyn