noverify icon indicating copy to clipboard operation
noverify copied to clipboard

Add a notion of "script" and handle them in a special way

Open quasilyte opened this issue 5 years ago • 4 comments

In PHP, there is no easy way to distinguish entry points ("scripts") and library code.

This difference is important for the analysis, because scripts may reside in the same folder while being unrelated. They can define the same types and functions, one of them will override another's symbols since NoVerify doesn't treat them in a special way.

Given this layout:

src/script1.php
src/script2.php
src/lib.php

NoVerify will be launched over src (source root) and if script1 and script2 have conflicts, the behavior is not very well-defined, it depends on the file traversal order.

The proposed solution is to handle "scripts" in such a way that we store their symbols locally and don't use in the global analysis purposes.

  • When we analyze script1, we see symbols from script1+lib
  • When we analyze script2, we see symbols from script2+lib

How we identify a file as a script? If any of these apply, file is considered to be a script:

  • It has control flow statements (for/if/goto/etc) on the top level.
  • It has print/echo/exit/die on the top level.
  • It has try/catch/throw on the top level.

quasilyte avatar Sep 02 '19 17:09 quasilyte

I suggest that you mark these files explicitly. Because libs can do some weird stuff like defining constants, registering shutdown functions, etc that do not cause immediate side-effects and that does not make them "scripts". Also PHPStorm, for example, as far as I know, does not have notion of "entrypoints" and considers all global variables to be always defined, for example. That was the inspiration for noverify. Linter should prevent code like this from being written, not the other way around :).

YuriyNasretdinov avatar Sep 02 '19 17:09 YuriyNasretdinov

I'm generally OK with both approaches. Need more feedback from others.

CC @atercattus, @whisk.

quasilyte avatar Sep 02 '19 17:09 quasilyte

The downside of explicit marking is that most users won't mark their entry points/scripts, so NoVerify would behave in sub-optimal way by default. There is no conventional way to mark them, so we also need to invent some and ask users to use it (otherwise they can see some issues).

The downside of auto-inference is implicitness and fragility to corner cases. Though I can confirm that proposed heuristics work well for most files I've seen.

quasilyte avatar Sep 02 '19 18:09 quasilyte

Heuristics like these are fragile, unintuitive and lead to frustration when small changes in a file could to huge changes in the analysis result and users probably don't expect that kind of behaviour. Also in most modern PHP frameworks there usually is a couple of entry points (for web) and a predictable location for CLI scripts so it shouldn't be hard for most users to specify where their entry points are in a project.

YuriyNasretdinov avatar Sep 02 '19 18:09 YuriyNasretdinov