PHP_CodeSniffer icon indicating copy to clipboard operation
PHP_CodeSniffer copied to clipboard

Proposal: Language Server

Open ObliviousHarmony opened this issue 1 year ago • 14 comments

Is your feature request related to a problem?

After migrating from PHPStorm to VS Code I noticed that the available PHP_CodeSniffer extensions lacked meaningful integration with editor features.

  • phpcs: Despite being the most popular option, this extension only supports in-line warnings/errors. The extension provides no auto-fix or formatting options at all. It also runs phpcs synchronously for each file.
    • Although it continues to "work" depending on your use-case, this extension is currently unmaintained and abandoned.
    • There is a fork that has a large number of downloads but it is also abandoned.
  • PHP Sniffer & Beautifier: This extension provides support for in-line warnings/errors as well as document formatting. This extension does not support in-line fixes and selection document formatting.
  • PHP Sniffer: Another extension that provides in-line warnings/errors and full/selection document formatting. It does not support in-line fixes.
    • It supports selection document formatting by only running against the selected snippets and adding in a <?php tag. While in theory this is great, in practice, it means any warnings/errors registered by a token on an unselected line won't be included in the formatting.

Despite some getting most of the way there, none of these extensions fully utilize VS Code's language tools, nor do they provide feature parity with PHPStorm:

  • Line/File Ignore: Support for automatically adding phpcs:ignore comments on lines or files using a right-click context menu.
  • Individual Fixes: Fix selected problem(s) using a right-click context menu.
  • Performance: Even with a debounce, running phpcs against a large file with a lot of expensive sniffs can take a while.

To be clear, these are limitations imposed by phpcs. These extensions take advantage of the json report type and there is no other provided report that bridges the gap.

Describe the solution you'd like

As developers do, when I encountered these issues, my first instinct was (naturally) to create my own extension (obligatory xkcd). (Technically my first instinct was to contribute upstream but the extensions were all abandoned or tiny at the time.) This extension achieves parity with PHPStorm using a custom report integration that returns warnings/errors and supports both individual fixes and document/selection formatting. This approach allows for a great deal of control over the behavior of phpcs, however, it requires extending internal classes. While the classes I've extended seem relatively safe, it's not ideal for compatibility.

Given how ubiquitous VS Code has become (along with its myriad of popular forks), it seems reasonable for us to invest in providing better support for these kinds of tools. I've got a few possible approaches that I'd like to discuss:

Implement Custom Reports

The meat of my custom integration is an extended Fixer class and File class. It works by ignoring warnings/errors that aren't specifically relevant to the requested report. For instance, individual fixes work by ignoring all warnings/errors other than the given sniff source and token.

The least contentious option would be to add similar functionality directly within PHP_CodeSniffer. This is agnostic to any specific editor or extension and opens the door for others to provide the same functionality anywhere they require it. The main negative I see with this approach is that it would only provide support with the latest version of phpcs. This means any extension would still have to support the old report style moving forward.

Another avenue here could be to add this functionality in a separate Composer package. My extension does something similar to support use-cases where phpcs is run on a guest OS via Docker or similar. This could be bundled with anything that wanted to use this extended functionality.

Language Server

One caveat to distributing a package (or bundling reports) is that it still requires custom extensions to be developed for each editor that wants to use the functionality. Microsoft's Language Server Protocol was created to provide developers a consistent interface for providing language features to any editor. We could provide a language server and then any extension would be free to use it.

Integrated Server

The most robust option is to build a language server into phpcs directly. My assumption is that deeply integrating like this would allow us to take advantage of the benefits of having a long-lived process. For example, could we use an in-memory cache of open files with their tokens and associated messages to save time? I'm not totally sure and if this is something we're interested in then we should profile phpcs and see what kind of savings we would get if we eliminated stuff we can cache.

The caveat is that this feels like a pretty big undertaking. There's a package with Language Server Protocol DTOs but we would still need to add our own RPC reading/writing. This also has the same problem of bundling in that it's only available for the latest version of phpcs.

Separate Server

Microsoft provides a framework for building language servers that takes care of the server/client communication and server lifecycle. All we would need to do is provide an interface between phpcs and the language server. This is essentially what my extension does already, however, it uses VS Code's API directly rather than providing an LSP interface.

The problem with a separate server is that compatibility becomes an issue again. In order to avoid bundling code in the latest phpcs release we would need to provide a custom report type for the server. As the "official" language server we would need to make maintaining compatibility a priority. I haven't thought deeply about this in a while but one option could be a hybrid approach. We could start shipping a language server report in phpcs and provide any necessary polyfills for older versions in the language server. This gives us forward compatibility (since the report and all related modifications are bundled) as well as backward compatibility (a little hacky perhaps but these versions aren't going to ever change).


I lean really heavily towards building a separate language server. It provides the most comprehensive compatibility and if we start bundling the language server report generation we can also get the benefits of deeper integration. As for the server itself, we can take advantage of asynchronous workers and caching to make it as performant as possible. In my experience this has worked out really well.

To be clear, I am totally on board with building the language server. I've had a lot of fun working on my VS Code extension and Microsoft's language server framework is pretty much 1:1 with VS Code's API. There's a few differences and I'd use the opportunity to re-write everything but I don't see this being a significant time investment.

ObliviousHarmony avatar Oct 23 '24 00:10 ObliviousHarmony

@ObliviousHarmony Thanks for opening this issue and I applaud your efforts in creating a VSCode integration.

In my opinion, facilitating the ability for IDEs to create integrations is inside the scope of the PHPCS project (but only just and only when the changes are reasonable). Anything else, like actually providing integrations with IDEs is out of scope.

PHPCS is first and foremost a stand-alone CLI tool. The fact that popular IDEs want to provide support for the tooling is great and welcome, however for any such integration the functionality provided by PHPCS should be leading, not the other way around.

As a side-note: the text in your issue is confusing to me: who are the "we" you keep talking about ?

Implement Custom Reports

It is quite unclear to me what you are proposing here. A custom report, but what is the custom report supposed to do ?

As a rule of thumb, new reports will rarely be accepted. Custom reports can already be provided via an external standard since PHPCS 3.3.0: squizlabs/PHP_CodeSniffer#1948 (and it seems you figured out how to get that working already).

Language Server

Again, it is unclear to me what you are proposing. By the looks of it, there are already several PHP language servers available.

Also, refactoring PHPCS to support a single IDE is completely out of the question.

take advantage of the benefits of having a long-lived process

I must be missing something, but as I said before PHPCS is first and foremost a stand-alone CLI tool, so I honestly don't see how this would apply to PHPCS.

If anything, I urge you to contact the PHPStorm team and teams working on integrations with other IDEs and to come up with something which will be usefull and usable by all. I can even imagine you all creating a (report) protocol which could be used by most, if not all, PHP static analysis tools, including PHPStan, CS-Fixer, Phan etc Preferably, such a project would then also be jointly maintained by you all.


I get the impression that in for both proposals, it's about getting a report for only a small portion of the code in a file. Is that correct ?

In that case, these proposals are a dead-end. Files should always be scanned as a whole as sniffs will often look at the larger context of a file. Some errors may be reported on line 1 if they apply to the file as a whole, which is problematic when reports don't take the whole file into account.

The main negative I see with this approach is that it would only provide support with the latest version of phpcs. This means any extension would still have to support the old report style moving forward. ... This also has the same problem of bundling in that it's only available for the latest version of phpcs.

That will always be the case for changes in PHPCS itself.

All in all, nothing I've read here reads like something which is in-scope for the PHPCS project.

jrfnl avatar Oct 23 '24 01:10 jrfnl

Thanks for the feedback @jrfnl!

As a side-note: the text in your issue is confusing to me: who are the "we" you keep talking about ?

Sorry, "we" is "me!" I used to work for an executive that was obsessed with everyone always using the royal "we". I guess it just ended up sticking with me. You're the first person to ever question it :smile:

It is quite unclear to me what you are proposing here. A custom report, but what is the custom report supposed to do ?

My calling it a "custom report" was overly simplistic. It's an integration that extends phpcs with a custom report that reads input from an environment variable and processes requests from the extension. The primary purpose of the integration is to provide a more granular level of control over the fixes that are applied. As a quick example, this request fixes a single problem at a specific line:column position. It then returns the sections of the file that have changed with the new content so that the extension can apply it.

This is made possible by extending PHP_CodeSniffer\Files\File with my own. One of the neat things I noticed was that addMessage() could be extended and I could reject messages that weren't relevant to the current request. This opened the door to applying single sniffs (performance optimized) and fixing only sections of the file.

By the looks of it, there are already several PHP language servers available.

Sorry, I should have been more clear. A "language server" in this context is a service that provides language features, such as linting and formatting in our case. It communicates with clients using the Language Server Protocol and as such is editor-agnostic. LSP is supported by most editors including Visual Studio Code, Sublime, JetBrains IDEs, vim, and more. You just willed it into existence :smile:

What I am proposing is the creation of a PHPCS Language Server NPM package. Using Microsoft's Node.js LSP server, I can create a service that interfaces with PHPCS via the CLI. It passes the documents via stdin with appropriate arguments and then supplies clients with the diagnostics, inline fixes, and formatting.

Also, refactoring PHPCS to support a single IDE is completely out of the question.

That would be absolute insanity!

I get the impression that in for both proposals, it's about getting a report for only a small portion of the code in a file. Is that correct ?

Taking a step back, the primary purpose of the integration was to provide support for a --range option that does not exist. This option would accept a {startLine}-{endLine} input with an optional column using {line:column}. This gets you the ranged formatting; combining it with --sniffs will get the single-sniff fixes. This seems like it aligns pretty well with PHPCS being a CLI tool first and foremost. As an added bonus, if it supports multiple ranges, it can be used to target specific sections of code.

Files should always be scanned as a whole as sniffs will often look at the larger context of a file. Some errors may be reported on line 1 if they apply to the file as a whole, which is problematic when reports don't take the whole file into account.

While true, if the caller says they only care about problems in a specific range, is it really a issue?

One thing I want to note is that the approach taken in my integration still processes the entire file. By returning prematurely in File::addMessage(), we avoid registering the message and no problems are reported. When fixing, sniffs rely on File::addFixableWarning() and File::addFixableError() returning true when the fix should be applied. Since File::addMessage() returns false for messages out of the --range, the fix is not applied. Thanks to persistence of tokens too, the range won't be affected by changes made by fixes in a single fix cycle. There's still an edge case with multiple cycles, however, limiting ranges to a single fix cycle per call might be reasonable.

ObliviousHarmony avatar Oct 23 '24 07:10 ObliviousHarmony

I've also been recently looking into this and I think a good VSCode extension could be built if PHPCS provided the following changes:

  1. For the json report output for phpcs, add a parameter for "line_end" and "column_end". As far as I can tell with my limited experience, phpcs probably has some kind of tokening system, so these values would be the end of the relevant token for the sniff. These values could then be used to provide the underlines for errors / warnings. Currently there's not an easy way to tell what part should be underlined, only where the underline should start. This seems to be what @ObliviousHarmony is asking for as well, however, my view is that this would be used only to decorate a file (e.g. provide underlines), not to limit reporting to a single section.
  2. For PHPCBF, create an option to provide a json report. Currently there's not a way to run the formatter without affecting the file in question whilst also getting the output of the formatting. This would be similar to the ESLint CLI's --fix-dry-run option. This would need some work on the first party side because phpcbf currently doesn't support reports other than diff, so creating a custom report is not an option. a. Summary information (total files fix, total issues fixed, issues remaining etc.) b. For each file, it would provide a summary of info for that file (same as above but for that file) c. For each file, it would provide the fixed file resulting output as a string instead of updating the file directly. I can see a similar report exists for producing diff reports or for creating suffixed files, so hopefully it's as easy as redirecting the output after fixing?

This should allow for a VSCode extension (or other tooling) to use the JSON output and easily provide the ability to lint and format files by using the output from the CLI, and perhaps is more in scope than the changes requested by @ObliviousHarmony?

mikeybinns avatar Apr 04 '25 19:04 mikeybinns

Hey @jrfnl!

I use @ObliviousHarmony's extension in vscode and additionally I do work on the psalm extension for vscode that talks to psalms agnostic LSP which is what @ObliviousHarmony is talking about. The LSP which would be inside of code sniffer would be agnostic according to the Microsoft agnostic definitions of an LSP. Any client supporting the LSP standard could connect to an LSP server and many support that today such as neovim. Vs code. Php storm and the like

Getting this into the core of code sniffer would be great

tm1000 avatar Sep 24 '25 22:09 tm1000

One of the things that eventually occurred to me @tm1000 is that a language server for PHPCS might be better left outside of the CLI tool. There are a bunch of different versions of PHPCS out there being used in the wild and building an extension that relies on always using the latest would be really exclusionary. When I manage to find the time to do so, I plan to create a language server based on my extension and consume that instead. I would still really be in favor of having upstream support for better integration with a language server, however, I haven't hit any compatibility issues with my extension that make me feel it's explicitly necessary.

ObliviousHarmony avatar Sep 25 '25 06:09 ObliviousHarmony

@ObliviousHarmony I was thinking the same too if it was possible. In the cast of Psalm the LSP is part-in-part with the core code.

tm1000 avatar Sep 25 '25 17:09 tm1000

@ObliviousHarmony @tm1000 @mikeybinns Looks like there is some discussion going on about the possibility of having an official PHP Language Server at PHP internals.

Might be interesting for you to follow that discussion & contribute to it if you can: https://externals.io/message/129036

jrfnl avatar Nov 01 '25 05:11 jrfnl

@jrfnl just a few comments but an official language server would not negate the need for an LSP in PHP_CodeSniffer (as it would also not eliminate the need for an LSP in Psalm & PHPStan)

We could bolt on an LSP in PHP_CodeSniffer without affecting the main cli as was done in Psalm.

PHP_CodeSniffer doesnt need to do anything other than tell the user the errors it finds and format files, all user configurable.

Psalm and PHPStan and Phan provide more than that (which is why they are mentioned in these discussions) because they are able to infer generics and docs such as array<int> but I dont think that needs to be the role of PHP_CodeSniffer at all.

tm1000 avatar Nov 03 '25 18:11 tm1000

@tm1000 Well unless and until someone can actually make it concrete what is expected in PHPCS, this issue will never be addressed.

What I've gathered so far, is that a custom report would be needed. Well, that doesn't need to be in PHPCS itself. Custom reports can be hosted in another package and PHPCS fully supports that.

Other than that, from what I understand at this time, the aim is to only get errors/warnings for the "changed lines". That shows a fundamental lack of understanding of how PHPCS works and is never going to happen.

jrfnl avatar Nov 03 '25 18:11 jrfnl

That shows a fundamental lack of understanding of how PHPCS works and is never going to happen.

Just being honest but this comment is rather rude and off putting.

Throughout this entire thread the three of us are never on the same page (actually I think ObliviousHarmony and I are on the same page). The determination of this project to just continue to say "no no no" when the scope of this would be entirely on other people is interesting. The other continued responses of "never going to happen" in at least two replies now is also telling.

If you'd simply look at what @ObliviousHarmony has already created and done for IDEs and that many users already use it and it additionally works with php code sniffer already you'd see how it works. It's already working right now. People are using it. There is a need already. He just wants to see if it can be a part of the main project but that's increasingly just getting "no never" with no real thought (imho)

It's open source. So all good. Moving on to other things

(PS I do want to thank you, Juliette, for taking over the project. You're doing good/great things. All good if we see differently on a subject)

tm1000 avatar Nov 03 '25 18:11 tm1000

@tm1000 I think you missed my first comment:

Well unless and until someone can actually make it concrete what is expected in PHPCS, this issue will never be addressed.

I still don't feel that anyone has been able to demonstrate what would be needed and I readily admit that I don't fully understand what the ask is here.

Until that "ask" becomes clear, no decision can ever be taken (nor any action).

jrfnl avatar Nov 03 '25 19:11 jrfnl

@jrfnl

Before I start, the Language Server Protocol (LSP) is a spec describing how a development tool can interact with a separate long-running service to receive diagnostic information about a given document. A raw text file is sent to the language server and it does whatever it needs to. This can mean static analysis but it can also mean linting.

What I've gathered so far, is that a custom report would be needed. Well, that doesn't need to be in PHPCS itself. Custom reports can be hosted in another package and PHPCS fully supports that.

In the interest of some clarity, if you don't mind, I'd like to take another pass at explaining what this comment tried to. I'd just like to make sure that I'm being as clear as possible.

Current Integration

[!NOTE] The current custom report code API does not provide the diagnostic information and formatting customization required to fully implement a language server. I'd imagine that the main reason JetBrains' IDEs are able to get around that limitation is because it includes an incredibly robust intellisense platform.

Although I do make use of custom reports, in order to get additional functionality out of PHPCS, I am extending internal PHPCS classes and re-tokenizing the document. At its core, I am abusing addMessage() and dropping messages that don't meet the criteria of the current report being used.

The other tweak I needed was a custom Fixer class. Rather than trying to analyze a diff to generate the list of changes for a given message, I extract the specific changesets for each autofix. For message-level fixes, it even utilizes the token's pointer index to uniquely identify each message to completely prevent off-target autofixes.

A custom report then bootstraps my PHPCS code and re-scans using my special File and Fixer classes. I exclusively use the phpcs binary file and never touch phpcbf so that I can avoid the internal changes caused by the formatter code path.

While this works really well, it's very fragile and comes with quite a performance cost for unnecessarily re-scanning the document.

A language server inside of PHP_CodeSniffer would be a long-running process that keeps tokens in memory with associated messages. Using the LSP, PHPCS would provide this information to clients and support both message-level autofixes and document formatting. The server runs asynchronously and allows for a very responsive IDE integration.


I've had quite a long time to think about this and I don't know if it's necessary. Any time a document changes it still requires re-tokenization unless we implemented robust diff-based token updates. This wouldn't even be that helpful since we'd still need to re-scan the entire document because sniffs can apply messages anywhere in the document. It's just a guess but I doubt we'd see enough of a performance increase to justify implementing and maintaining something so complicated.

What would be nice is if we could get the above File and Fixer features available using CLI options. I would be interested in implementing two options that alter the behavior of the tool:

  • --range: Limits the registration of messages based on line[:column]-line[:column]. This works with both phpcs and phpcbf because we're just ignoring all messages outside of the range.
  • --message-id: On addMessage(), whether it gets ignored by --range or not, record and increment an ID so that each message can be externally identified. Add this to any reports where it makes sense.

I would also be interested in expanding Fixer to allow tracking token replacements and reverts. As an added bonus, this would make it possibly to generate a diff report without requiring that consumers have the tool installed :smile:

These enhancements would make it possible for a separate server application to bridge the LSP client with PHPCS. This doesn't need to live in this repo and it doesn't need to be associated with the project at all.

ObliviousHarmony avatar Nov 03 '25 21:11 ObliviousHarmony

This wouldn't even be that helpful since we'd still need to re-scan the entire document because sniffs can apply messages anywhere in the document.

--range: Limits the registration of messages based on line[:column]-line[:column]. This works with both phpcs and phpcbf because we're just ignoring all messages outside of the range.

@ObliviousHarmony Can you explain to me how the above two statements could ever work together ? To me, they are in complete conflict, which is exactly what I've pointed out before.

As explained before, both here as well as in multiple other issues, while some sniffs may only look at a limited range of tokens, a significant number of sniffs do not. If you change something on line 50, this may cause a new warning from a sniff to appear on line 20.

Something like --range would disregard that completely, which will mean that we end up in a situation where the IDE is reporting that everything is fine (based on the changed lines), while when PHPCS is then run from the command line in CI, new issues will be reported and CI will fail.

To me, that feels like something which will only frustrate end-users and will cause a significant additional burden of support requests/complaints for this repo.

jrfnl avatar Nov 03 '25 21:11 jrfnl

As explained before, both here as well as in multiple other issues, while some sniffs may only look at a limited range of tokens, a significant number of sniffs do not. If you change something on line 50, this may cause a new warning from a sniff to appear on line 20.

What I'm proposing has nothing to do with document scanning, it's about the registration of messages.

When the user looks at the warning/error in their IDE or CLI it will be on line 20, not line 50. When they highlight a range that includes line 20 or select that specific message, they expect that problem to be fixed. They don't need to know that it's going to change something on line 100, they just need it to automatically fix the violation marked on line 20.

The sniff will call $phpcsFile->addMessage() for a stack position on line 20 when it finds the violation on line 50. Any message outside of the --range selection or with the wrong --message-id will be ignored. This is the same way that specific codes, severities, and file paths exclusions are currently handled. The above filtering will also prevent any of those violations from being fixed unless selected.

It doesn't matter why a sniff decided to put a violation on line 20. All the user cares about is making the violation on line 20 go away. Even in cases where the context is complicated and would impact line 50 when applied, from the user's perspective, there's no error on line 50; there's only an error on line 20.

ObliviousHarmony avatar Nov 03 '25 22:11 ObliviousHarmony