ideas
ideas copied to clipboard
Accurate gitignore generator
Project description
.gitignore
generators outputs all potentially applicable rules, relying on a project's language or platform, whether those will be used or not.
An accurate generator would only include useful ones for a project.
At first glance, here are two ways of doing this.
-
Dirty : output rules from existing generators, filter out those which won't have any effect (i.e. path does not exist) ;
-
Clean : output rules from smart detections of dev tools and dependencies.
Relevant Technology
The generator must use a language or platform that can run :
-
on any operating system ;
-
alongside any other installed language, platform or dev tool ;
-
ideally without any dependency.
The generator would ideally support all languages and platforms that could benefit from it.
Complexity and required time
Complexity
- [ ] Beginner - This project requires no or little prior knowledge of the technolog(y|ies) specified to contribute to the project
- [x] Intermediate - The user should have some prior knowledge of the technolog(y|ies) to the point where they know how to use it, but not necessarily all the nooks and crannies of the technology
- [ ] Advanced - The project requires the user to have a good understanding of all components of the project to contribute
Required time (ETA)
- [ ] Little work - A couple of days
- [x] Medium work - A week or two
- [ ] Much work - The project will take more than a couple of weeks and serious planning is required
Categories
- [ ] Mobile app
- [ ] IoT
- [ ] Web app
- [ ] Frontend/UI
- [ ] AI/ML
- [ ] APIs/Backend
- [ ] Voice Assistant
- [x] Developer Tooling
- [ ] Extension/Plugin/Add-On
- [ ] Design/UX
- [ ] AR/VR
- [ ] Bots
- [ ] Security
- [ ] Blockchain
- [ ] Futuristic Tech/Something Unique
What is the scope of this task? Doing something like the tool in this link ?
When I saw this idea, I thought it was a project that automatically creates a .gitignore file via the command line. For example, fetching the technologies from files in this repository and creating a .gitignore file.
The projects you're referring to are the ones which I call inaccurate, although a dirty implementation of my proposal could be based on those.
It could be a CLI, and/or an IDE plugin.
I'm not quite sure how that detection should work. For example detecting an ide does not mean a specific project is opened with it. Otherwise interesting
@Idrinth there are solutions that are considered dirty, for example
For the clean solution, it seems to me that this situation can be solved with project-based plugins. For example, if it is a Next js project, only the lines related to Next should be brought from the fetch project Node .gitignore. This should be done for every language and technology, and in-app plugins should be making by providing contribution. 🤔
detecting an ide does not mean a specific project is opened with it
Opening a project in an IntelliJ IDE creates an .idea
directory at the project's root, opening a project in Visual Studio Code creates a .vscode
directory at the project's root.
And half the time(ok, often, but maybe not that often) someone committed that and it's opened by something else, let's say sublime text
What's the issue then ? If the developer uses Sublime Text, there will be nothing related to that choice that requires any .gitignore
update
There is a boiled down concept here that actually makes alot of sense - Have a CLI that communicates with the files in githubs template repo. Have the user select a template from a menu, that will generate the "dirty" ignore file. After this simply read the new file line by line and test if the path exists, if its not, cut that line out of the file. I'd be happy to work on this project if anyone else is keen.
EDIT: Just to extend upon this idea -- That loop would also use something like a lookup table that would match the string to a "handler" function. which would in turn implement rules / actions / checks to be performed should that string be in the ignore file.
Suggestion : before asking the user to select which gitignore files to use, filter out from the list those that already doesn't output any match.
Suggestion : before asking the user to select which gitignore files to use, filter out from the list those that already doesn't output any match.
My only concern there is CPU Cycles. There would need to be a very efficient algorithm to sort & checking potentially millions of lines of text, against a recursive scan of the current working dir. it could become a very memory heavy process very quickly. Especially if we're talking about larger projects like a Ruby on Rails or JHipster project for example, rather large codebases (many files) and for every line in the gitignore we need to check against each individual file.
Not if you put the working directory content in an array first, then check it against the gitignore files.
Not if you put the working directory content in an array first, then check it against the gitignore files.
Thats actually the exact process I had in my head when I wrote that. Got to remember, lets say for each item in the array (that represents a file in the working directory tree) we need to make a call out to a function or something that either magically has every line from every file in this repo stored and sorted so we know what matches are relevant to what theme/framework/whatever OR We make a HTTP call to read each one of the files and see if we get matches that way. But that match that we do get needs to be stored as not only a match, but a match from Foo.gitignore
(For Example).
And we need to repeat that for every file in the current project. You either end up with a very large install size because misc data that we may need, or we have high bandwidth usage because of the HTTP calls.
If you have a different solution im all ears haha.
- The app would clone the gitignore repo in a fixed location outside the project (for example
~/.cache/gitignore
) and pull it at run - The app would create an array for each file, excluding negative rules and subdirectory rules
- The app would only store the gitignore name in an array when it matches, but not the matching rule itself
- The app would immediately stop trying to check rules in a file once one already matched, and try the next one instead
I actually implemented something similar in Go.
It is just a fun project so I didn't really add too many features, it just checks for a .gitignore
file in the directory where the tool is used and deletes the lines that aren't necessary.
Well, it's not entirely what I suggested, but it's a step :)
I have a good initial version working! There might be more to do, but mostly I only need a few more language deterministic patterns before I'm ready to call it v1.
Latest release (GitHub) | Repository (GitHub)
VS Code Extension (VS Code Marketplace) | VS Code Extension Repository (GitHub)
Binary verified working on:
- macOS
- M1
- Intel
- Windows
- x386
- Linux
- x86_64
How it works
The general flow is this:
- Populate the cache from gitignore (or update if needed)
- Template matching:
-
Algorithm 1: Deterministic:
- Iterate through map of Glob patterns with common file types on the project. If it finds a type of file glob directly associated with a template (e.g.
*.{ts,js}x?
for Node), this template is selected
- Iterate through map of Glob patterns with common file types on the project. If it finds a type of file glob directly associated with a template (e.g.
-
Algorithm 2: Process of elimination - If Algorithm 1 did not find matches:
- Iterate through all languages in template list, eliminating any glob patterns that are not found in the project. If a language has 1 or more patterns that DO match, they are selected.
- In both methods, multiple can be matched
-
Algorithm 1: Deterministic:
- With the resulting templates:
- If there is only one candidate, auto select it
- If there are multiple, trigger multi select
- Ask if user wants to clean up unused ignore lines
- The cleanup process is as such:
- Iterate each line, ignore comment lines
- When a line is found and a pattern matches a file on the project, go back and collect the previous group of comment lines that preceded it
- These become the output
- Output all the remaining lines together
- The cleanup process is as such:
- Prepend section comment with language name for each language file in output (if there is more than 1)
- Ask to overwrite/append/skip file if already exists
Any suggestions welcome.
Would love some help with static file language detection, if you know any project files that are usually consistent per language/project I would love to hear it :) such as package.json
for node, __init__.py
for Python, etc.
I am keeping track of all the languages I statically check for right here: https://github.com/chenasraf/gi_gen/issues/2
Feel free to add to the list so I can implement 🙏🏼
Also created a VS Code extension that uses the above program :)
@KaKi87, I hate to make many comments in a row, but I am really looking forward to my submission being reviewed. If you feel it is appropriate, I will appreciate closing this issue, or, letting me know what can be changed, to make that happen.
Thank you for the idea, I loved working on this.
I ran gi_gen
on an existing project using an IntelliJ+Node+Electron stack.
- Only Node was detected, which means the generated file in overwrite mode will always be incomplete ;
- Duplicate entries may be generated in append mode ;
- Universally ignorable files & directories (e.g.
dist
,build
,config.js
, etc.) are either detected as language-specific or undetected.
Here's the initial content :
.idea
node_modules
config.js
.parcel-cache
dist
build
Here's the resulting diff in overwrite mode :
diff --git a/.gitignore b/.gitignore
index 38f2865..82925f5 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,4 @@
-.idea
-node_modules
-config.js
-.parcel-cache
-dist
-build
\ No newline at end of file
+# Dependency directories
+node_modules/
+# Nuxt.js build / generate output
+dist
\ No newline at end of file
Here's the resulting diff in append mode :
diff --git a/.gitignore b/.gitignore
index 38f2865..69c094d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,4 +3,8 @@ node_modules
config.js
.parcel-cache
dist
-build
\ No newline at end of file
+build
+# Dependency directories
+node_modules/
+# Nuxt.js build / generate output
+dist
\ No newline at end of file
Additionally, when using this particular mode, I would suggest not removing matching lines even if no template contains it, or asking the user's permission to do so.
Thank you for your interest into my idea.
Thanks for the reply @KaKi87 :)
Only Node was detected, which means the generated file in overwrite mode will always be incomplete
Yes, I need more examples of files to test against for more template types, I am keeping track of what's possible right now via this issue. I will add IDEs and more
Duplicate entries may be generated in append mode ;
True, I am not modifying the existing contents of the gitignore file, only the ones that GI Gen generates, and then gets added/replaces... It's definitely a point to improve upon. Generally I guess we would want both cleanup logic & dedupe logic in the end output file, not only the output file before it is appended.
Universally ignorable files & directories (e.g. dist, build, config.js, etc.) are either detected as language-specific or undetected.
Do you have a suggestion on what to do there? Should I ignore some specific examples when matching? A build
directory is probably a rule in a lot of templates, I can't think of a way without blacklisting that line specifically
Additionally, when using this particular mode, I would suggest not removing matching lines even if no template contains it, or asking the user's permission to do so.
Is the prompt to clean unused lines not what you mean? Can you elaborate?
Do you have a suggestion on what to do there?
I asked myself about this before posting, and knew that you'd ask as well, unfortunately I'm as clueless as you. :sweat_smile:
Not only those directories exist in many templates as you know, but the developer might not even use such directory as an output one, but maybe store build scripts/tools in it, and use a differently named directory, to store generated builds.
Is the prompt to clean unused lines not what you mean?
No, that one works fine.
Can you elaborate?
gi_gen
's overwrite mode removed .idea
, config.js
, .parcel-cache
and build
from the file, because those lines didn't exist in the selected template, although were all matching something in the project.
In that case, the tool should preserve those, or ask the user.