phpstan-drupal
phpstan-drupal copied to clipboard
Baseline for Drupal metadata
This was initially mentioned in #818, but since it closed with the PR automatically and I can't reopen it, I'll duplicate my proposal/question.
Problem/motivation
Currently, some Drupal-specific information is collected through parsing the codebase or is simply hardcoded. This can actually lead to unwanted problems and potentially create bugs in the code instead of finding them.
A good example of this is services. The extension simply hardcodes some of them (like synthetic ones) that are defined in Drupal runtime and parses *.services.yml files to collect information about the others. This opens a whole new world of bugs and uncertainty.
Here are several examples that come to mind:
-
The extension treats all
*.services.ymlfiles as given. This means that even services from disabled modules are considered valid. If you reference a service from a disabled module in your code, PHPStan will not detect the problem, because it assumes the service is available. -
It cannot detect services registered during service container compilation (in runtime). Anything added through a Compiler Pass is considered non-existent in this case.
-
Similar to the previous point, it cannot detect services that are removed or altered from the container for any reason during a Compiler Pass.
-
It only supports specific features for services, such as decorators and aliases. The problem is that some of these features are now available through PHP attributes, which introduces a new issue. Consider this example, which will not be processed properly:
Drupal\foo\Bar: {}#[AsDecorator(decorates: 'baz_service', priority: -100)] final readonly class Bar { public function __construct( #[AutowireDecorated] private Baz $inner, ) {} } -
Or a new Drupal hook system based on PHP attributes. The entire src/Hook directory is essentially registered as services. You could technically parse the whole folder and achieve the same result, but why? Every time something similar happens in Drupal, it will require emulating that in the extension.
To support all these cases, the extension would essentially have to start emulating all this functionality on its own to understand what's going on. However, some parts are simply unrealistic to emulate, such as a Compiler Pass, which can significantly alter the container based on other factors like the database or specific environment variables. It's simply unrealistic to support all this functionality through emulation, as the amount of work required would be insane.
Proposed solution
This is nothing new in the PHP world: PHPStorm does this (phpstorm-meta), PHPStan does this (baseline). We can build a baseline/environment-knowledge/drupal-meta (it doesn't matter what to call it). The main idea is to create a command for the extension that will dump all the necessary information from the runtime directly.
Let's take services as an example. This command can simply load the service container from the Drupal database and then dump it in a PHP file with an array inside (exactly as it done by Chi-teck/drupal-code-generator), where all the necessary information for the extension will be stored. Such as the service name, the assigned class, whether it's a decorator, and everything like that.
Yes, it would require some manual work, and yes, at least for the first steps, it should be optional. But it solves basically all the problems. You don't need a properly bootstrapped Drupal for that in CI, because it will be in the repository as a phpstan-drupal-baseline.php, for example. It is much easier to parse, it contains all the relevant information about the project. It must be much easier to write rules for such a file than it is now, when you have to write a lot of other parsers, finders, etc.
If this approach works well, it can be extended to other subsystems, such as entity types. With dumped data, it can be properly checked in multiple cases (since entities can be altered, but the extension can't detect it).
Theoretically, this approach has great potential regarding fields. If the extension knows exactly which entity has which fields of which types, whether they are optional, what entity reference type they have, what FieldItemInterface the field type is using, and everything like that, it will likely detect many new problems in the codebases. Not to mention, it will allow writing much simpler code without worrying about phpstan-ignore-next-line simply because it doesn't understand what ->first() means for a specific field of a specific entity type.
With proper metadata, it would be possible to understand what ->first()->get('value')->getValue() means (DataType API). Currently, this forces developers to write a huge wall of code with asserts or ignore the call completely, but it could contain bugs!
For example: $node->get('field_tags')->first()->get('entity')->getValue(). Even if the field is required, it doesn't mean that the referenced tag is still present. The getValue() should return ?TermInterface in that case (which will highlight problems if there is no check for entity in the code), but it's currently impossible to explain this to PHPStan.
Having detailed metadata would allow PHPStan to correctly interpret such complex chains of method calls and provide accurate type information at each step, significantly improving code quality and reducing potential bugs.
This method could significantly improve the analysis capabilities and reduce the need for manual exceptions in the code.
This is just a proposal and some thoughts, not a call to action. It feels like the current approach is quite limited and, in fact, much more expensive to maintain and extend.
I'm personally in the camp that this is the wrong direction for PHPStan Drupal to go. The about of the PHPStan repo explains what the PHPStan name stands for:
PHP Static Analysis Tool - discover bugs in your code without running it!
The problem and solutions you're describing require to move away from static analysis and towards a sort of hybrid approach. However, this only works in cases where there's a 1-to-1 relationship between code and runtime configuration.
For Open Social and other productized codebases this kind of set-up would be really problematic because there is no 1 correct configuration of modules and services. Instead, things should work in many different configurations.
What I would love to see instead, is that Drupal itself might move more things from its database to a statically analysable place. For example, rather than having to write a compiler pass to do something, allow us to express it in our services.yml file or in a new YAML file.
For Open Social and other productized codebases this kind of set-up would be really problematic because there is no 1 correct configuration of modules and services. Instead, things should work in many different configurations.
I do understand that as well, and I have no answer to that. I agree, this proposal is not a silver bullet that solves all the problems. This is why I proposed this to be an option (disabled by default). This can be good for a custom project, but can be tricky or impossible for Drupal core. Like PHPStorm meta, which is basically of no use in Drupal core. Because it assumes that it either should be generated with all core modules enabled, or it simply won't work properly. But having such metadata with everything enabled is also an edge case, which doesn't reflect the actual environment.
What I would love to see instead, is that Drupal itself might move more things from its database to a statically analysable place. For example, rather than having to write a compiler pass to do something, allow us to express it in our services.yml file or in a new YAML file.
This would be preferable, but the problem is I don't think this is possible. Simply because Compiler Pass can have complex logic. If I remember correctly, Drupal doesn't even support the %env(FOO)% notation for service container parameters, which forces writing service providers that alter container based on environment variables.
As I mentioned in the original post, there are some features now that take advantage of PHP attributes, like with decorators. This simplifies the code significantly, making it more verbose and self-describing about what it's doing, without the need to investigate the *.services.yml file. However, it brings a lot of issues for such tools, meaning this package should start parsing reflections of all services to detect such logic. There are also service collectors and other tools from Symfony that already work in Drupal, intentionally or not. I don't think it's realistically possible to maintain all that stuff here, as some of them are simply impossible and not available without a database connection.
As an alternative, this could be done in a separate project as a PoC to see how it goes and what new problems appear. But at the current state, I think with time going by, this project will detect less and less problems/bugs, simply because it lacks information about the actual state of the codebase. Or the maintenance of all those tools and parsers might become a huge burden.
But it's very interesting to hear how we can solve some of these problems. Any input is very appreciated.
While writing this, I understood that we can't even parse Drupal plugins reliably, even though it sounds easy. Derivers can break all the logic, which might be impossible to get just by static analyzing it, since derivers can be provided based on data from the database.
I think especially around the container (with things like env support) a lot of this is being discussed within Drupal in https://www.drupal.org/project/drupal/issues/3228629 and its child issues.
I think https://www.drupal.org/project/drupal/issues/3488989 is the most direct version of it where there's also an outline of "why" things live in the database rather than on disk. However, if the approach suggested by Chi can be followed of enabling disk-based dumping with a database fallback, then that would fix the container part of this ticket :D
I'd love to see efforts invested in those Drupal core issues, rather than adding to the maintenance and complexity of a tool like PHPStan Drupal. Fixing those core issues has immediate DX implications too, even outside of static analysis.