PHP_CodeSniffer
PHP_CodeSniffer copied to clipboard
4.0 | Proposal: improvements to DocBlock tokenization
Current Situation
As things are, tokens within DocBlocks are tokenized as T_DOC_COMMENT_*
tokens.
Additionally, there are two tokens in a docblock which receive additional information:
- The
T_DOC_COMMENT_OPEN_TAG
token will have the following additional keys in the token array:-
comment_tags
:array<int>
- array with stack pointers to DocBlock tags -
comment_closer
:int
- stack pointer to the DocBlock closer
-
- The
T_DOC_COMMENT_CLOSE_TAG
token will have the following additional key in the token array:-
comment_opener
:int
- stack pointer to the DocBlock opener
-
The problem
Sometimes a sniff may want to listen for a specific Docblock token, like T_DOC_COMMENT_TAG
, and depending on the findings may then want to jump to the DocBlock opener/closer to check some additional info.
This is currently not possible.
The work-around is to always listen for a T_DOC_COMMENT_OPEN_TAG
token, loop through the comment_tags
to find the target(s), and jump to the tag from there. The comment_closer
info is then available to the sniff on the T_DOC_COMMENT_OPEN_TAG
token.
Alternatively, a sniff can walk to the start/end of the DocBlock from any token within, but depending on how extensive the available documentation is (including possible annotations for Doctrine, Psalm, PHPUnit), walking the tokens to the start/end of the DocBlock can easily mean having to walk hundreds of tokens.
Proposal
I'd like to propose adding the comment_opener
and comment_closer
indexes to all tokens within a DocBlock, so the information about the opener/closer is always available.
For those sniffs affected by the above outlined problem, this should allow for improving the performance.
Alternatives considered
Other token types with an *_opener
/*_closer
often have a token array index on the tokens between the "opener" and "closer" in array format with information about all applicable (potentially nested) openers/closers (or "owners") (also see the details in the fold-out below), however DocBlocks can not be nested, so putting the information about the opener/closer in an array does not make sense to me for DocBlocks.
Other considerations
For all other opener/closer type of indexes, the *_opener
/*_closer
indexes are consistently both added to both the opener as well as the closer.
At this time, the way the indexes are set for DocBlocks is not consistent with that as the indexes for DocBlocks are set asymmetrically. This proposal fixes that too.
Details on all opener/closer types and what info is available in the token array
Bracket type | Opener + closer has these indexes | And may have | Tokens between have | Notes |
---|---|---|---|---|
Parentheses | parenthesis_opener + parenthesis_closer |
parenthesis_owner |
nested_parenthesis (array<int opener => int closer> ) |
|
Curly braces (all) | bracket_opener + bracket_closer |
-- | -- | See #12 for a proposal to add a nested_brackets key |
Square brackets (all, including short arrays) | bracket_opener + bracket_closer |
-- | -- | See #12 for a proposal to add a nested_brackets key |
Curly braces for control structures | scope_opener + scope_closer |
scope_condition |
conditions (array<int owner => int|string owner code> ) |
|
Alternative syntax for control structures | scope_opener + scope_closer |
scope_condition |
conditions (array ) |
|
Heredoc/Nowdoc | scope_opener + scope_closer |
scope_condition |
conditions (array ) |
|
Attributes | attribute_opener + attribute_closer |
-- | attribute_opener , attribute_closer , nested_attributes (array<int opener => int closer> ) |
Yeah, not all that consistent, but never mind that for now. |
Comments | Opener: comment_closer , Closer: comment_opener |
-- | -- |
Unless otherwise annotated, these indexes will all be integer stack pointers.
Planning
While this isn't necessarily a breaking change, I still propose to include the change in the 4.0 release for safety reasons, as there may be sniffs which explicitly only expect a comment_opener
/comment_closer
index on the T_DOC_COMMENT_CLOSE_TAG
/T_DOC_COMMENT_OPEN_TAG
tokens respectively and those sniffs may need to be adjusted.
Opinions ?
Please let me know if you have any concerns about this proposal or any suggestions for further enhancements to the DocBlock tokenization.
/cc @asispts @dingo-d @fredden @GaryJones @greg0ire @kukulich @michalbundyra @Ocramius @sirbrillig @stronk7 @weierophinney @wimg