Since version 1.24.0 finding files recursively is broken
I believe PR #283 introduced a flaw in finding files recursively when specifying a directory search on the command line. Specifically you can no longer check files only within a specific subdirectory.
The code in the PR strips out the directory information so after that you can no longer make any decisions based on full directory path.
The set of command below demonstrates the issue.
:~/tmp% mkdir yaml-test
:~/tmp% cd yaml-test
:~/tmp/yaml-test% cat >test.yaml
testing: [
:~/tmp/yaml-test% mkdir sub
:~/tmp/yaml-test% cp test.yaml sub
:~/tmp/yaml-test% yamllint .
./test.yaml
1:1 warning missing document start "---" (document-start)
2:1 error syntax error: expected the node content, but found '<stream end>' (syntax)
./sub/test.yaml
1:1 warning missing document start "---" (document-start)
2:1 error syntax error: expected the node content, but found '<stream end>' (syntax)
:~/tmp/yaml-test% cat >.yamllint
extend: default
yaml-files:
- sub/*.yaml
:~/tmp/yaml-test% yamllint .
:~/tmp/yaml-test% yamllint -v
yamllint 1.25.0
Thanks for the report, I can confirm that it worked with 1.23.0, but not with 1.24.0 and versions above.
My understanding of the problem is that https://github.com/adrienverge/yamllint/pull/283 was incorrect, and we should revert it. @dafyddj @jsok what do you think?
The problem comes from the fact that python-pathspec configured with gitwildmatch, ['*.yaml'] matches files like dir.yaml/file.sql:
>>> import pathspec
>>> p = pathspec.PathSpec.from_lines('gitwildmatch', ['*.yaml'])
>>> p.match_file('file.yaml')
True
>>> p.match_file('dir/file.sql')
False
>>> p.match_file('dir.yaml/file.sql')
True
I don't know whether it's the expected behavior, so I asked at https://github.com/cpburnz/python-path-specification/issues/41.
This problem was reported again (in #390). I still think that #283 introduced a bigger problem than the one it solved, and we should revert it. @jsok @dafyddj @andrewimeson what do you think?
It's hard for me to estimate which has more impact, but based on the reports for each issue, I think that the one that #283 fixes is more rare.
It would be ideal if we could figure out how to solve #279 without breaking the directory matching, but in the meantime reverting #283 makes sense to me.
@andrewimeson thanks for your feedback. Solving #279 without breaking the directory matching seems possible, but complex (see discussion at https://github.com/cpburnz/python-path-specification/issues/41).
We can wait for other users' feedback, but if someone wants to solve #334 by reverting commit a221898, I'm +1 to merge it.
This problem was reported again (in #546).
Now that pathspec has a new GitIgnoreSpec class to fix the original problem, the solution could be to cancel #283 and use the new pathspec class. For example :
- remove addition of os.path.basename() from commit a221898,
- switch from pathspec.PathSpec to pathspec.GitIgnoreSpec,
- require pathspec ≥ 0.10.0 in setup.cfg,
- add new cases in tests/test_cli.py, including examples for real-life issues #334, #390 and #546.
Unfortunately the author of the problematic PR @jsok doesn't answer. To anyone who wants to tackle the problem: contribution welcome!
If #283 is broken then revert it 😃 I did my best to add test cases but it's clearly caused a regression that no test caught. It would be wise to add the scenarios outlined in this issue as test cases to avoid future regressions.