tolerant-php-parser
tolerant-php-parser copied to clipboard
Output of `token_get_all()` depends on php build and config options for short tags
$ php --no-php-ini -d short_open_tag=0 -r 'echo json_encode(token_get_all("<? php"));'
[[321,"<? php",1]]$
$ php --no-php-ini -d short_open_tag=0 -r 'echo json_encode(token_get_all("<? php")) . "\n";'
[[321,"<? php",1]]
$ php --no-php-ini -d short_open_tag=1 -r 'echo json_encode(token_get_all("<? php")) . "\n";'
[[379,"<?",1],[382," ",1],[319,"php",1]]
$ php --no-php-ini -d short_open_tag=0 -r 'echo json_encode(token_get_all("<?php?>")) . "\n";'
[[321,"<?php?>",1]]
$ php --no-php-ini -d short_open_tag=1 -r 'echo json_encode(token_get_all("<?php?>")) . "\n";'
[[379,"<?",1],[319,"php",1],[381,"?>",1]]
This seems to be related to the options used to build PHP See https://secure.php.net/manual/en/ini.core.php#ini.short-open-tag and https://secure.php.net/manual/en/language.basic-syntax.phptags.php
PHP also allows for short open tag <? (which is discouraged since it is only available if enabled using the short_open_tag php.ini configuration file directive, or if PHP was configured with the --enable-short-tags option).
This also causes two test failures in programStructure21.php.tree and programStructure13.php.tree, and causes that test to save a different .tree to disk (Which may then get accidentally be configured)
- See programStructure21.php and programStructure13.php.
- Maybe detect this ini setting and conditionally skip those test.
This can be checked by
ini_get('short_open_tag') === 1. This was seen in my PHP build from source, which used travis/compile.sh (ini_get() returned that even when started with--no-php-ini, but didn't when-d short_open_tag=0was provided)
Impact:
- Test failure is inconvenient for developers working on project
- The tolerant-php-parser won't work as well on codebases with short open tags if a PHP binary that isn't configured to support short open tags is used to analyze it.
(I don't think this is a common use case. This issue affects
<?, not<?=)
Possible Remediations:
- ini_set() can't be used
- May be able to split the output of token_get_all() into smaller tokens and process the smaller tokens, if a token begins with
<?(Might do the wrong thing for non-PHP) - Document and accept this bug?
- Refuse to start server unless
short_open_tagis disabled via ini setting (checkable via ini_get()) (Or optionally restart binary under the hood)
Looking at similar projects, nikic/php-parser also uses token_get_all - in that issue, a user wanted to enable short_open_tags to parse a project using short open tags https://github.com/nikic/PHP-Parser/issues/290
I'm considered proposing an RFC adding new flags such as TOKEN_ENABLE_SHORT_OPEN_TAG/TOKEN_DISABLE_SHORT_OPEN_TAG for token_get_all as a php ast - it seems doable to temporarily override it (move from CG(short_tags) to a new setting SCNG(short_tags) in Zend/zend_language_scanner.l)
- Allow analyzers to parse/analyze/lint projects targeting a deployment environment supporting short open tags/no short open tags, regardless of what the user configured locally
- Make it more convenient to programatically convert short open tags to
<?phpin migration scripts - Allow linters/scripts to easily warn about code with short open tags that would be T_INLINE_HTML when short tags are disabled
- Without moving from compiler globals to a copy in SCNG, CG(short_tags) might stay changed if there is an error (e.g. out of memory for
memory_limit) during a call to token_get_all()