Lost comment on the last item of an array
Hello,
<?php
$a = [
1, // Comment 1
2, // Comment 2
];
When parsing the above example, the Comment 2 is lost. I think it is because it would be stored against a potential third item on the array. It would be great if there was a way to recover it.
I know it is lost because I have done a dump of the serialization of the parsed statements, and it is not there.
This is the json dump, if it helps:
[
{
"nodeType": "Expr_Assign",
"var": {
"nodeType": "Expr_Variable",
"name": "a",
"attributes": {
"startLine": 3,
"endLine": 3
}
},
"expr": {
"nodeType": "Expr_Array",
"items": [
{
"nodeType": "Expr_ArrayItem",
"key": null,
"value": {
"nodeType": "Scalar_LNumber",
"value": 1,
"attributes": {
"startLine": 4,
"endLine": 4,
"kind": 10
}
},
"byRef": false,
"attributes": {
"startLine": 4,
"endLine": 4
}
},
{
"nodeType": "Expr_ArrayItem",
"key": null,
"value": {
"nodeType": "Scalar_LNumber",
"value": 2,
"attributes": {
"startLine": 5,
"comments": [
{
"nodeType": "Comment",
"text": "\/\/ Comment 1\n",
"line": 4,
"filePos": 21
}
],
"endLine": 5,
"kind": 10
}
},
"byRef": false,
"attributes": {
"startLine": 5,
"comments": [
{
"nodeType": "Comment",
"text": "\/\/ Comment 1\n",
"line": 4,
"filePos": 21
}
],
"endLine": 5
}
}
],
"attributes": {
"startLine": 3,
"endLine": 6,
"kind": 2
}
},
"attributes": {
"startLine": 3,
"endLine": 6
}
}
]
Thanks for your great work.
When parsing the above example, the Comment 2 is lost. I think it is because it would be stored against a potential third item on the array. It would be great if there was a way to recover it.
This is indeed the case. PHP-Parser always associates comments with the following node -- if there is no following node, they are lost.
However, there is a way to manually retrieve such comments with a bit of extra work. You need to enable token positions as described in the lexer docs and obtain the tokens with a $lexer->getTokens() call.
Then you should be able to retrieve trailing comments like these using something like the following code:
function getTrailingComment(array $tokens, Node $node) {
assert($node->hasAttribute('endTokenPos'));
$pos = $node->getAttribute('endTokenPos');
$endLine = $node->getAttribute('endLine');
for (; $pos < count($tokens); ++$pos) {
if (!is_array($tokens[$pos])) continue;
list($type, $content, $line) = $tokens[$pos];
if ($line > $endLine) break;
if ($type === T_COMMENT || $type === T_DOC_COMMENT) {
return $content;
}
}
return null;
}
This code will return the first comment after the node that is still on the same line.
When I was processing line comments, the way they are assigned to the next statement, I thought, isn't the expected way to factor this. Wouldn't it be better if they are a statement/expression on their own?