phpunit icon indicating copy to clipboard operation
phpunit copied to clipboard

GH-6114: Static path matching

Open dantleech opened this issue 9 months ago • 13 comments

This (early stage) WIP PR introduces a static path matcher which intends to emulate the behavior of the PHPUnit FileIterator in order to prevent PHPUnit traversing the filesystem when a deprecation is triggered.

The PHPUnit FileIterator uses glob to find directories and we therefore need to support the glob patterns - which can vary according to the platform. This PR uses https://man7.org/linux/man-pages/man7/glob.7.html as a reference in addition to testing the behavior locally to confirm assumptions.

The webmozart/glob provides a similar feature however it's behavior is different as it supports curly braces, and * is restricted to a single directory level, while * in PHPUnit will return all descendants and I'm sure there are other differences - however I've used that as a starting point.

TODO:

  • [x] Escaping unhandled regex characters, or escaping the regex before parsing the pattern.
  • [x] Character classes [:alnum:] etc.
  • [ ] Collating symbols
  • [ ] Equivalence class expressions
  • [x] Emulating glob behavior of unterminated [ character groups.
  • [ ] Seems to be a PHPUnit globstar bug whereby /a** will match /b and all other directories, where as /ab* will not match anything. We can either copy that behavior or "fix" it.

~and maybe writing the implementation from scratch if regex turns out to be a bad fit.~

Usages on Github:

  • wildcard: 1.8k https://github.com/search?q=directory+%22*%22+path%3A*.xml+path%3A**%2Fphpunit.xml+language%3AXML&type=code
  • globstar: 378: https://github.com/search?q=directory+%22**%22+path%3A*.xml+path%3A**%2Fphpunit.xml+language%3AXML&type=code
  • Character range/class (directory.*\[): 0: https://github.com/search?q=%2Fdirectory.%5C%5B%2F+path%3A.xml+path%3A**%2Fphpunit.xml+language%3AXML&type=code
  • ? any char: 0 https://github.com/search?q=%2Fdirectory.%5C%5B%2F+path%3A.xml+path%3A**%2Fphpunit.xml+language%3AXML&type=code

dantleech avatar Mar 10 '25 12:03 dantleech

Have refactored to tokenize the glob string, while less performant it's easier to reason about and we only need to compile the regex for each <include or <exclude - not for each path.

dantleech avatar Mar 12 '25 11:03 dantleech

@sebastianbergmann this PR is indicative of the approach but needs the final 20% of work:

  • Refactor the source filter/mapper tests to create the paths they need in a temporary directory? this would make it easier to test edge cases without contaminating other tests and grant mode confidence.
  • Implement the exclude/include logic (either by compiling that into the regex or as a separate concern).
  • We can implement or forbid the use of [[:character-ranges:]] and other "esoteric" glob patterns.

But I'm keen for any thought you have on the approach so far - the biggest concern for me is providing confidence that it will be compatible with at least 99.99% of configuration files.

dantleech avatar Mar 18 '25 10:03 dantleech

@sebastianbergmann this PR is indicative of the approach but needs the final 20% of work:

* Refactor the source filter/mapper tests to create the paths they need in a temporary directory? this would make it easier to test edge cases without contaminating other tests and grant mode confidence.

* Implement the exclude/include logic (either by compiling that into the regex or as a separate concern).

* We can implement or forbid the use of `[[:character-ranges:]]` and other "esoteric" glob patterns.

Thank you for your work so far. I am afraid that I might not be able to look into this before the PHPUnit Code Sprint in April in Berlin. There I will be able to look into this, for sure, and together with other developers. I hope I can look into this before then, but I cannot promise that.

But I'm keen for any thought you have on the approach so far - the biggest concern for me is providing confidence that it will be compatible with at least 99.99% of configuration files.

We could delay this until PHPUnit 13 and break backward compatibility (for edge cases, "esoteric" glob patterns, etc.).

sebastianbergmann avatar Mar 18 '25 10:03 sebastianbergmann

Is it expected that the tests are currently failing?

sebastianbergmann avatar Apr 03 '25 04:04 sebastianbergmann

Yes - the prefix/suffix filtering hasn't been added yet and the tests fail due to the tests we added upstream. The PR is at an inflection point:

  • Do we want to commit to supprting all the glob features? or could we simplify it.
  • Do we want to refactor the source-finding/filtering tests to be more dynamic?[1]
  • Are we happy with the general approach in this PR?

And then I was also considering how to implement the suffix/prefix logic - as it probably makes sense to include them in the compiled regex - but then it intrudes on the "copy the behavior of glob" approach" - so if we decide to simplify the globbing behavior it makes that decision easier too.

[1] currently all the source mapping/filtering tests share the same fixture which makes it hard to test edge cases without breaking other tests - I think it could generate the file tree instead.

dantleech avatar Apr 03 '25 06:04 dantleech

I discussed this today with @edorian, @theseer, and @sebastianheuer at the PHPUnit Code Sprint. Even for more complex <source> configurations, we were unable to measure a runtime of more than 150 ms for the current implementation of SourceMapper::map(). We do not think that a potential reduction in runtime would justify a more complicated implementation.

Can you share how much time is spent in SourceMapper::map() in your environment and what the <source> section of your configuration looks like?

sebastianbergmann avatar Apr 26 '25 12:04 sebastianbergmann

Hi @sebastianbergmann - this is the repoducer repository - on my local envrionment it pauses for around 3-4 seconds while loading the source map.

dantleech avatar Apr 26 '25 12:04 dantleech

Hi @sebastianbergmann - this is the repoducer repository - on my local envrionment it pauses for around 3-4 seconds while loading the source map.

Thank you. How much does the time spent go down with your new implementation?

sebastianbergmann avatar Apr 26 '25 13:04 sebastianbergmann

filtering without file traversal basically removes the performance issue (the performance is similar to not having a source-map at all). the only issue really is replicating the glob behavior - or deliberately breaking compatiblity and simplifying the behavior to some extent.

dantleech avatar Apr 26 '25 13:04 dantleech

Thank you. I have to say, though, that I do not think that 3-4 seconds is too bad considering there are 120K files in the reproducer. I am now less convinced than I originally was that this is an issue worth fixing, sorry.

sebastianbergmann avatar Apr 26 '25 13:04 sebastianbergmann

On the project for which I reported this issue it takes 1 minute and 22 seconds. We've had to remove the source mapping entintirely in order to be able to use PHPUnit. I will endeavour to update the reproduction repository to replicate that.

dantleech avatar Apr 26 '25 13:04 dantleech

I thought I'd share that this is also a problem for others who are running PHPUnit on relatively large codebases. We have ~9.2k files and executing this loop takes ~8 seconds on my machine the first time E_USER_DEPRECATED is emitted. It's not the biggest issue in the world for us and the workaround is to fix all deprecation warnings (easier said than done) but addressing this would really improve our quality of life. Thank you for your consideration!

TomA-R avatar Jun 08 '25 03:06 TomA-R

@sebastianbergmann and I had a call and decided to simplify the approach in the PR to support the bare minumum glob syntax and that it would be part of the next major release and merged around December time.

dantleech avatar Jun 18 '25 15:06 dantleech

Once we have the new implementation, we could introduce it into the current stable version in addition to the old implemention. When old and new implementation "disagree" we could then emit a deprecation warning.

sebastianbergmann avatar Jun 23 '25 06:06 sebastianbergmann

@dantleech Did you have a chance to work on this yet? Thanks!

sebastianbergmann avatar Sep 23 '25 12:09 sebastianbergmann

hi, no not yet, need to find some time

dantleech avatar Sep 28 '25 13:09 dantleech

@dantleech FYI: main is not PHPUnit 13.0-dev (where these changes should land). This hopefully makes working on this easier/simpler/more convenient.

sebastianbergmann avatar Oct 02 '25 09:10 sebastianbergmann

would you be option to using a third-party package, namely webmozarts/glob? We could evaluate it with that implementation at least I guess.

dantleech avatar Oct 02 '25 12:10 dantleech

Sure, we can evaluate @webmozart's package. I am usually reluctant to add a new dependency (that I did not create / do not maintain myself), but if this package provides what we need and is maintained on the PHP versions we need then I do not see a problem.

sebastianbergmann avatar Oct 02 '25 14:10 sebastianbergmann

Replaced by #6398

dantleech avatar Oct 25 '25 13:10 dantleech