noverify
noverify copied to clipboard
Add a notion of "script" and handle them in a special way
In PHP, there is no easy way to distinguish entry points ("scripts") and library code.
This difference is important for the analysis, because scripts may reside in the same folder while being unrelated. They can define the same types and functions, one of them will override another's symbols since NoVerify doesn't treat them in a special way.
Given this layout:
src/script1.php
src/script2.php
src/lib.php
NoVerify will be launched over src
(source root) and if script1
and script2
have conflicts, the behavior is not very well-defined, it depends on the file traversal order.
The proposed solution is to handle "scripts" in such a way that we store their symbols locally and don't use in the global analysis purposes.
- When we analyze
script1
, we see symbols fromscript1
+lib
- When we analyze
script2
, we see symbols fromscript2
+lib
How we identify a file as a script? If any of these apply, file is considered to be a script:
- It has control flow statements (
for/if/goto/etc
) on the top level. - It has
print/echo/exit/die
on the top level. - It has
try/catch/throw
on the top level.
I suggest that you mark these files explicitly. Because libs can do some weird stuff like defining constants, registering shutdown functions, etc that do not cause immediate side-effects and that does not make them "scripts". Also PHPStorm, for example, as far as I know, does not have notion of "entrypoints" and considers all global variables to be always defined, for example. That was the inspiration for noverify. Linter should prevent code like this from being written, not the other way around :).
I'm generally OK with both approaches. Need more feedback from others.
CC @atercattus, @whisk.
The downside of explicit marking is that most users won't mark their entry points/scripts, so NoVerify would behave in sub-optimal way by default. There is no conventional way to mark them, so we also need to invent some and ask users to use it (otherwise they can see some issues).
The downside of auto-inference is implicitness and fragility to corner cases. Though I can confirm that proposed heuristics work well for most files I've seen.
Heuristics like these are fragile, unintuitive and lead to frustration when small changes in a file could to huge changes in the analysis result and users probably don't expect that kind of behaviour. Also in most modern PHP frameworks there usually is a couple of entry points (for web) and a predictable location for CLI scripts so it shouldn't be hard for most users to specify where their entry points are in a project.