vip-scanner icon indicating copy to clipboard operation
vip-scanner copied to clipboard

Migrate to a real PHP parser

Open ockham opened this issue 9 years ago • 5 comments

(Note: I'm a Code Wrangler in trial starting on 2014-11-10, and this is the project I've been assigned by @nb.)

This issue is supposed to serve as a place for discussion and as a roadmap, which I'll update based on my progress and the comments I get. So please share your thoughts on this, your input is valuable to me as I try to lay this out!

Quoting @nb:

The end goals are two:

  • Make writing rules easier
  • Make the parsing layer more stable and maintainable

I'm currently acquainting myself with the code. In order to replace parts of it, I will have to write tests (#3) to cover all checks in order to make sure I'm not breaking things along the way without even noticing. OTOH, some checks (especially some regular expressions) are currently just insufficient, either missing dangerous stuff (#194), or producing false positives (#18). Both can probably be fixed by using an actual parser, or at least a tokenizer.

Preliminary research on PHP parsers written in PHP yields this SO question. The most promising candidate found there is https://github.com/nikic/PHP-Parser, which seems actively maintained and licensed under the 3-clause BSD license (which is GPL compatible). The stable version only supports PHP >= 5.3. Question: Is this sufficient, or do we need to support 5.2, too?

ockham avatar Nov 10 '14 22:11 ockham

PHP-Parser does look promising, and more user-friendly than parsing the token stream directly.

Only supporting PHP >= 5.3 is perfectly fine.

nickdaugherty avatar Nov 10 '14 22:11 nickdaugherty

If we go with the "matching wpcom" we can go up to 5.5, Not that I think many things go to 5.5 but I love some of the 5.4 features (okay mostly short array syntax)

sboisvert avatar Nov 10 '14 22:11 sboisvert

If we go with the "matching wpcom" we can go up to 5.5

There are still some 5.4 servers.

joshbetz avatar Nov 10 '14 22:11 joshbetz

some checks (especially some regular expressions) are currently just insufficient, either missing dangerous stuff, or producing false positives.

Because of the way we use the scanner, false positives are really bad. False negatives (missing stuff) isn't great, but we still review all code before deploying to WordPress.com, so it's more acceptable. Obviously more accurate tests are always welcome.

Sounds like an awesome project!

joshbetz avatar Nov 10 '14 22:11 joshbetz

@ockham: Can we consider this as closed?

fklein-lu avatar Jul 06 '15 06:07 fklein-lu