isort
isort copied to clipboard
skip_gitignore=true will break ignoring directories defined in skip during filesystem traversal
Description
When starting isort with --skip_gitignore, directories that should not be traversed because they are configured in skip will be traversed.
From what I've seen, this is purely a performance problem, so all files that should be skipped will be filtered out at some later point.
Instructions to reproduce
Minimal setup (note: "node_modules" is in the default skip values.)
# assuming isort 5.10.1 is installed (`pip3 install isort`)
mkdir test && cd test
git init
mkdir -p node_modules/this_should_not_be_accessed
touch test.py
strace isort . 2>&1 | grep this_should_not # as expected, not queried
strace isort --skip-gitignore . 2>&1 | grep this_should_not # suddenly, stat and open syscalls appear
On a side note: Add more files, and the node_modules folder will be queried more often:
touch test2.py
strace isort --skip-gitignore . 2>&1 | grep this_should_not # 2 stat and 2 open calls
touch test3.py
strace isort --skip-gitignore . 2>&1 | grep this_should_not # 3 stat and 3 open calls
adding node_modules to a .gitignore doesn't help, even though the directory should then be double-ignored by isort (this is #1895).
echo "node_modules" >> .gitignore
strace isort --skip-gitignore . 2>&1 | grep this_should_not # 3 stat and 3 open calls
Environment & Versions
Run on ubuntu21.10 with python3.9.7. Actual syscalls probably depend on the python interpreter, if you can't reproduce, I'll probably be able to figure out what python code causes the access.
Related Issues
Closely related to #1895, but if I undestand that issue correctly, it does not talk about skip at all, but just files from .gitignore. ~It seems to me that the current PR for #1895, #1900, would fix this.~
Possibly related #1762 and #1777, both read as if they were fixed.
Background
Yesterday, I realized that isort takes >90 seconds to run on one project where it processes 107 files with approximately 10k lines of code. Turns out, it doesn't need 90 seconds to process, but rather 89s to traverse what feels like the entire file system, only to then do its actual work in ~1s:
time isort --show-files . > isort_output.txt
real 1m35.831s
time cat isort_files.txt | xargs isort
real 0m0.896s
which is absurd, considering that black also finds the code files and finishes in 3 seconds. I wasn't able to get isort to skip reading directories it really doesn't need to access (venv, node_modules, web server static files, ...)
Yeah, this is highly related to https://github.com/PyCQA/isort/issues/1895 and probably resolved by https://github.com/PyCQA/isort/pull/1900.
It sounds like your specific issue is:
- isort has a default ignore list, including
node_modules - when you pass in
--skip-gitignore, it actually runs slower because it reads every file then filters afterwards - isort is still functionally correct when you pass in
--skip-gitignore, it just takes too long
#1900 is now merged, but iirc, we still have cases where we traverse through directories that are ignored, see my comment on the PR and the response to it.
Just encountered this same problem with isort 5.10.1. I have a data directory with millions of files. My .gitignore includes /data/ as one of the directories to skip. I see the following performance metrics:
$ isort --skip-gitignore . # 23 min
$ isort --extend-skip data . # 43 sec
If I delete the data directory, isort . takes < 2 seconds. Both 23 min and 43 sec are prohibitively long, I usually end up just not using isort and manually sorting my imports 😢