PHP-Parser
PHP-Parser copied to clipboard
Hex chars greater than \x7f aborts silently the parsing
Hi. I ran into an issue regarding hex chars in a double quoted string. If I have a piece of code like the following:
<?php
$a = "\x6f";
I get as a result the following:
[{"nodeType":"Stmt_Expression","expr":{"nodeType":"Expr_Assign","var":{"nodeType":"Expr_Variable","name":"a","attributes":{"startLine":2,"endLine":2}},"expr":{"nodeType":"Scalar_String","value":"o","attributes":{"startLine":2,"endLine":2,"kind":2}},"attributes":{"startLine":2,"endLine":2}},"attributes":{"startLine":2,"endLine":2}}]
But if the variable hold a value greater than \x7f, I get an empty array as a result and no error. Any ideas? Thank you!
The problem here is probably in the JSON encoding. JSON only allows valid UTF-8 in strings, and \x7f
is not a valid UTF-8 sequence.
@nikic I don't understand your answer here. A string
in PHP is an array of bytes, so any valid byte values are allowed. The problem is that you're representing it as a string in JSON, instead of as an array of numbers.
Any update about this issue ?
Nope. Any suggestions on what to do about this?
Before converting ast to json, iterate through all nodes and encode the variable containing the illegal utf-8 string using base64_encode.
Any suggestions on what to do about this?
The problem is that you're representing it as a string in JSON, instead of as an array of numbers.
So represent it as that. A PHP string
is not an array of Unicode characters, it's just an array of bytes.
So stop trying to convert an arbitrary sequence of bytes into UTF-8.
Before converting ast to json, iterate through all nodes and encode the variable containing the illegal utf-8 string using base64_encode.
That sounds reasonable. We can add two extra visitors for encoding/decoding all strings in base64. It's unfortunate that this is necessary, but don't really see a way around.
Bump on this error :smiley: Will a fix be deployed ?