flow-development-collection
                                
                                 flow-development-collection copied to clipboard
                                
                                    flow-development-collection copied to clipboard
                            
                            
                            
                        Proxy compilation breaks on certain strings
Description
If a class subject to proxy compilation contains certain strings, the proxy compilation will interpret them as the class / interface name and throw an exception saying, that the class name does not match the file name.
This happens for all strings (as in collection of characters, not actual PHP strings) in a file that follow the following rules:
- After the line break there must only be lower case characters or whitespace
- Must contain class somethingorinterface something
Steps to Reproduce
- Create class TestClasscontaining one of the examples below in a package
- Trigger compilation by running flow:cache:flushandflow:core:compilecommands
- Observe the following exception: The name of the class "bar" is not the same as the filename which is "TestClass.php"
Example classes
Breaks
<?php declare(strict_types=1);
namespace Vendor\Foo\Bar;
class TestClass
{
    public function doSomething()
    {
        echo '
            only whitespace and lowercase class foo
        ';
    }
}
Breaks
<?php declare(strict_types=1);
namespace Vendor\Foo\Bar;
/*
class foo
*/
class TestClass
{
}
Does not break: String prefixed by upper case character
<?php declare(strict_types=1);
namespace Vendor\Foo\Bar;
class TestClass
{
    public function doSomething()
    {
        echo '
            Uppercase class foo
        ';
    }
}
Does not break: String prefixed by special character *
<?php declare(strict_types=1);
namespace Vendor\Foo\Bar;
/*
* class foo
*/
class TestClass
{
}
Expected behavior
Proxy should compile sucessfully, the flow:core:compile command should not have any output.
Actual behavior
Exception  The name of the class "bar" is not the same as the filename which is "TestClass.php" is thrown.
Affected Versions
I have only tested this on my current setup, which uses the following versions:
Neos: 4.3.6 Flow: 5.3.7
Additional information
So far I have tracked this issue down to the regex /^([a-z\s]*?)(final\s+)?(interface|class)\s+([a-zA-Z0-9_]+)/m used in Neos\Flow\ObjectManagement\Proxy\Compiler::cacheOriginalClassFileAndProxyCode which seems to be a bit to lax:
Oh boi, using regex for parsing semantics is always doomed to fail at some point. Yes, the regex is definitely "too lax", but really, you can't correctly parse a context sensitive language (PHP - "class" inside a string or comment does not mean the same as "class" outside it) with a simple regular language.
Not sure how to effectively fix this, other than using correct PHP language parsing. Or maybe we could use Ocramius' proxy manager (https://github.com/Ocramius/ProxyManager) to solve our needs, not sure - that would definitely need some investigation.
Thanks a lot for digging into this and describing the issue so well!
Another alternative would be using an actual parser (e.g. https://github.com/nikic/PHP-Parser ) to parse the file contents and extract the class name that way while sticking with the current proxy logic.
As I said in chat, that is on my list since ages. Would be great to do it. I think my last quick test seemed to indicate some performance problems with this. but it's a long while ago. Maybe by now that is no problem anymore.
I think I'd prefer using proxy-manager if possible, as that already handles all the gotchas with generating full proxy classes correctly for specific common use cases and is well tested. Also, it's already a transitive dependency.
As far as I can see, the proxy-manager doesn't generate the code itself (at least not the part we'd need) but uses laminas/laminas-code. Still, using these libraries would not solve the original issue at hand, because we still need take care of renaming the proxy classes ourselves in order to provide functionality like just doing a $foo = new Foo where Foo is the auto-generated proxy.
I tried to improve the Compiler code to be more resilient against code like the one reported by @j6s.
Best counter-argument from Robert against ProxyManager is the missing original class renaming (though that could somehow be solved) to allow new OriginalClass() to get a proxied instance and the missing flexibility to overwrite methods. Also, the generated class will only contain the proxy code, so you end up with having to autoload two files (depending on the impact of parsing time vs. file access might be negligible, but I know some OS where any additional file access is prohibitively costly).
btw heredoc and nowdoc breaks too of course.
how about a regex like this? its more stable but it will fail eventually...
(?:
  "[^"]*"                  # simple demo regex for string - should be extended to allow escapes
  |'[^']*'
  |\/\*[\s\S]*?\*\/        # regex to match comment
  |<<<([A-Z]+)[\s\S]+\1    # match heredoc, should be extended to match also nowdoc
)(*SKIP)(*FAIL)            # if any of the preceding matches skip and dont care about the match.
|^([a-z\s]*?)(final\s+)?(interface|class)\s+([a-zA-Z0-9_]+)
https://regex101.com/r/i1sj1A/1