tolerant-php-parser icon indicating copy to clipboard operation
tolerant-php-parser copied to clipboard

Output of `token_get_all()` depends on php build and config options for short tags

Open TysonAndre opened this issue 8 years ago • 1 comments
trafficstars

$ php --no-php-ini -d short_open_tag=0 -r 'echo json_encode(token_get_all("<? php"));'
[[321,"<? php",1]]$                     
$ php --no-php-ini -d short_open_tag=0 -r 'echo json_encode(token_get_all("<? php")) . "\n";'
[[321,"<? php",1]]
$ php --no-php-ini -d short_open_tag=1 -r 'echo json_encode(token_get_all("<? php")) . "\n";'
[[379,"<?",1],[382," ",1],[319,"php",1]]
$ php --no-php-ini -d short_open_tag=0 -r 'echo json_encode(token_get_all("<?php?>")) . "\n";'
[[321,"<?php?>",1]]
$ php --no-php-ini -d short_open_tag=1 -r 'echo json_encode(token_get_all("<?php?>")) . "\n";'
[[379,"<?",1],[319,"php",1],[381,"?>",1]]

This seems to be related to the options used to build PHP See https://secure.php.net/manual/en/ini.core.php#ini.short-open-tag and https://secure.php.net/manual/en/language.basic-syntax.phptags.php

PHP also allows for short open tag <? (which is discouraged since it is only available if enabled using the short_open_tag php.ini configuration file directive, or if PHP was configured with the --enable-short-tags option).

This also causes two test failures in programStructure21.php.tree and programStructure13.php.tree, and causes that test to save a different .tree to disk (Which may then get accidentally be configured)

  • See programStructure21.php and programStructure13.php.
  • Maybe detect this ini setting and conditionally skip those test. This can be checked by ini_get('short_open_tag') === 1. This was seen in my PHP build from source, which used travis/compile.sh (ini_get() returned that even when started with --no-php-ini, but didn't when -d short_open_tag=0 was provided)

Impact:

  • Test failure is inconvenient for developers working on project
  • The tolerant-php-parser won't work as well on codebases with short open tags if a PHP binary that isn't configured to support short open tags is used to analyze it. (I don't think this is a common use case. This issue affects <?, not <?=)

Possible Remediations:

  • ini_set() can't be used
  • May be able to split the output of token_get_all() into smaller tokens and process the smaller tokens, if a token begins with <? (Might do the wrong thing for non-PHP)
  • Document and accept this bug?
  • Refuse to start server unless short_open_tag is disabled via ini setting (checkable via ini_get()) (Or optionally restart binary under the hood)

TysonAndre avatar Sep 20 '17 04:09 TysonAndre

Looking at similar projects, nikic/php-parser also uses token_get_all - in that issue, a user wanted to enable short_open_tags to parse a project using short open tags https://github.com/nikic/PHP-Parser/issues/290

I'm considered proposing an RFC adding new flags such as TOKEN_ENABLE_SHORT_OPEN_TAG/TOKEN_DISABLE_SHORT_OPEN_TAG for token_get_all as a php ast - it seems doable to temporarily override it (move from CG(short_tags) to a new setting SCNG(short_tags) in Zend/zend_language_scanner.l)

  • Allow analyzers to parse/analyze/lint projects targeting a deployment environment supporting short open tags/no short open tags, regardless of what the user configured locally
  • Make it more convenient to programatically convert short open tags to <?php in migration scripts
  • Allow linters/scripts to easily warn about code with short open tags that would be T_INLINE_HTML when short tags are disabled
  • Without moving from compiler globals to a copy in SCNG, CG(short_tags) might stay changed if there is an error (e.g. out of memory for memory_limit) during a call to token_get_all()

TysonAndre avatar Sep 25 '22 14:09 TysonAndre