esss_fix_format icon indicating copy to clipboard operation
esss_fix_format copied to clipboard

Performace problem when many ignored files are present in the repo

Open prusse-martin opened this issue 4 years ago • 4 comments

Related to #58

Listing all ignored files to later filter out those can have a big impact when we have some ignored folder with many files. Maybe we should call ls-tree to obtain the file list:

W:\alfasim\Projects\alfasim\alfasim_gui (fb-ASIM-3738-transient-input -> origin/current)
(_alfasim_gui-win64-py36) λ git ls-tree -r HEAD --name-only | head -n 15
.coveragerc
.project
.pydevproject
.vscode/launch.json
.vscode/settings.json
.vscode/tasks.json
alfasim_gui.spec.yml
dist/all/.gitignore
docs/.gitignore
docs/ALFAsim_Technical_Manual___EN_US.pdf
docs/conf.py
docs/gui.rst
docs/images/advanced.png
docs/images/advanced_options.png
docs/images/advanced_options_model_explorer.png

In my local machine:

W:\alfasim\Projects\alfasim\alfasim_gui (fb-ASIM-3738-transient-input -> origin/current)
(_alfasim_gui-win64-py36) λ git status --ignored --untracked-files=all  --porcelain=2 | wc -l
41613

W:\alfasim\Projects\alfasim\alfasim_gui (fb-ASIM-3738-transient-input -> origin/current)
(_alfasim_gui-win64-py36) λ git ls-tree -r HEAD --name-only | wc -l
858

Having git list all untracked files (41k) to later filter the files from the repo was a bad idea (my bad).

Does calling git ls-tree -r HEAD --name-only to get the name of the tracked files plus parsing the output of git status --untracked-files=all to get the untracked files letting git it self filter out the ignored files a good approach?

prusse-martin avatar Jan 29 '21 20:01 prusse-martin

Having git list all untracked files (41k) to later filter the files from the repo was a bad idea (my bad).

Can you post some timings as well? I wouldn't think listing 41k files would take too long (we're talking minutes if I recall our discussion in RC).

Does calling git ls-tree -r HEAD --name-only to get the name of the tracked files plus parsing the output of git status --untracked-files=all to get the untracked files letting git it self filter out the ignored files a good approach?

Not sure, why would that be faster? I mean currently we do a single git call (https://github.com/ESSS/esss_fix_format/pull/59), you think is the parsing of that output that is showing a slowdown?

nicoddemus avatar Feb 01 '21 12:02 nicoddemus

@ggrbill was having a 20min delay when executing that one single call ( git status --ignored --untracked-files=all --porcelain=2), his ignored "tmp" folder had over 2GB and the file count was well over 4 000 000 000. Asking git to list will allow it to better handle the "ignored" files.

prusse-martin avatar Feb 01 '21 13:02 prusse-martin

ahh ok, got it, thanks.

So the proposal is to only execute ff on tracked files except untracked files, instead of executing on all files except ignored ones?

nicoddemus avatar Feb 01 '21 15:02 nicoddemus

(tracked files) + (untracked but not ignored)

prusse-martin avatar Feb 01 '21 15:02 prusse-martin