Performace problem when many ignored files are present in the repo
Related to #58
Listing all ignored files to later filter out those can have a big impact when we have some ignored folder with many files.
Maybe we should call ls-tree to obtain the file list:
W:\alfasim\Projects\alfasim\alfasim_gui (fb-ASIM-3738-transient-input -> origin/current)
(_alfasim_gui-win64-py36) λ git ls-tree -r HEAD --name-only | head -n 15
.coveragerc
.project
.pydevproject
.vscode/launch.json
.vscode/settings.json
.vscode/tasks.json
alfasim_gui.spec.yml
dist/all/.gitignore
docs/.gitignore
docs/ALFAsim_Technical_Manual___EN_US.pdf
docs/conf.py
docs/gui.rst
docs/images/advanced.png
docs/images/advanced_options.png
docs/images/advanced_options_model_explorer.png
In my local machine:
W:\alfasim\Projects\alfasim\alfasim_gui (fb-ASIM-3738-transient-input -> origin/current)
(_alfasim_gui-win64-py36) λ git status --ignored --untracked-files=all --porcelain=2 | wc -l
41613
W:\alfasim\Projects\alfasim\alfasim_gui (fb-ASIM-3738-transient-input -> origin/current)
(_alfasim_gui-win64-py36) λ git ls-tree -r HEAD --name-only | wc -l
858
Having git list all untracked files (41k) to later filter the files from the repo was a bad idea (my bad).
Does calling git ls-tree -r HEAD --name-only to get the name of the tracked files plus parsing the output of git status --untracked-files=all to get the untracked files letting git it self filter out the ignored files a good approach?
Having git list all untracked files (41k) to later filter the files from the repo was a bad idea (my bad).
Can you post some timings as well? I wouldn't think listing 41k files would take too long (we're talking minutes if I recall our discussion in RC).
Does calling git ls-tree -r HEAD --name-only to get the name of the tracked files plus parsing the output of git status --untracked-files=all to get the untracked files letting git it self filter out the ignored files a good approach?
Not sure, why would that be faster? I mean currently we do a single git call (https://github.com/ESSS/esss_fix_format/pull/59), you think is the parsing of that output that is showing a slowdown?
@ggrbill was having a 20min delay when executing that one single call ( git status --ignored --untracked-files=all --porcelain=2), his ignored "tmp" folder had over 2GB and the file count was well over 4 000 000 000.
Asking git to list will allow it to better handle the "ignored" files.
ahh ok, got it, thanks.
So the proposal is to only execute ff on tracked files except untracked files, instead of executing on all files except ignored ones?
(tracked files) + (untracked but not ignored)