language: Select language based on longest matching path extension
Closes #8408
Closes #10997
This is a reboot of my original PR from last year. I believe that I've addressed all the comments raised in that original review, but Zed has changed a lot in the past year, so I'm sure there will be some new stuff to consider too.
- updates the language matching and lookup to consider not just "does the suffix/glob match" but also "... and is it the longest such match"
- adds a new
LanguageCustomFileTypesstruct to pass user globs from settings to the registry - minor/unrelated: updates a test for the JS extension that wasn't actually testing what is intended to
- minor/unrelated: removed 2 redundant path extensions from the JS lang extension
Languages that may use this
- Laravel Blade templates use the
blade.phpcompound extension -
apparently Angular uses
component.html- see also https://github.com/zed-industries/extensions/issues/169
-
hypothetically someone could publish a "JS test" extension w/ custom highlights and/or snippets; many JS tests use
test.jsorspec.js
Verifying these changes
I added a number of assertions for this new behavior, and I also confirmed that the (recently patched) Laravel Blade extension opens as expected for blade.php files, whereas on main it does not.
cc @maxbrunsfeld (reviewed my original PR last year), @osiewicz and @MrSubidubi (have recently been in this part of the code)
Release Notes:
- Added support for "compound" file extensions in language extensions, such
blade.phpandcomponent.html. Closes #8408 and #10997.
| Messages | |
|---|---|
| :book: |
This PR includes links to the following GitHub Issues: #10765, #https://github.com/zed-industries/extensions/issues/169
If this PR aims to close an issue, please include a |
Generated by :no_entry_sign: dangerJS against c87f7e266067333b01ecda58fa646901e7301952
@maxbrunsfeld @osiewicz Nudging this, gently and politely. 😄 I've been using it w/o issue for the past few weeks, it will close 2 long standing issues, and staff have demonstrated interest in this feature/change several times in the past.
Input from @MrSubidubi would be welcome too, of course, esp as this change interacts with your recent changes in this area.
Thank you for taking a peek, if you're able!
Small and potentially slightly off-topic coment:
I would possibly argue that in the long-run, glob patterns should also be available for language matching. There are some languages that would benefit from glob matches already and the ambiguity of file_suffixes is probably somthing you had to deal with a lot here.
Generally, I think it might be safe to say that if a glob pattern matches, it might be a better match than just a suffix match and we could potentially also incorporate https://github.com/zed-industries/zed/pull/11697#discussion_r1605598630 more easily. Admittedly, I might also be missing something here, feel free to point that out should that be the case.
But then, I would guess that this is not something for now but rather something to keep in mind for later. With the tests you added, I think it would be easy to add this in later whilst ensuring that we do not regress on previous behavior.
Other than that, it does look good to me.
glob patterns should also be available for language matching
Yes, I found myself wondering the same thing, but also felt that it was out of scope here.
Thank you for checking this out!
Thanks for following up @claytonrcarter . Reviewing now; I might push some minor stylistic changes.
@maxbrunsfeld Thank you for the review. I think I understood what you are asking for, but please correct me if I'm wrong:
My initial implementation was focused on comparing the length of the "pattern" that was being used for matching (ie the file extension or glob pattern), but this latest version is now storing the length of what that pattern actually matched (ie from the list of extension/filename/path in language_registry::language_for_file_internal().
An example:
- extension 1 is looking for
path_suffixes = ["php"] - extension 2 is looking for
path_suffixes = ["blade.php"] - I open a file called
templates/index.blade.php
Both extension 1 and extension 2 will be compared (in language_registry::language_for_file_internal()) against a list built from the file name, eg ["php", "index.blade.php", "templates/index.blade.php"]
In my previous version, I was comparing the length of the matched element from path_suffixes; so 3 for ext 1 (because "php" matched the filename) and 9 for ext 2 (because "blade.php" matched the filename).
In this version, we should be comparing the length of the matched element from language_registry::language_for_file_internal();
- ext 1 will match the first element (because
"php" == "php) and will score 3 - ext 2 will not match the first element, but will match the second, (because
"index.blade.php".ends_with("blade.php")) and will score 15
I hope that this is close to what you are asking for; please advise it it's not! The tests are passing on this, and I've tested it a little just now, but I'll put it through its paces tomorrow and see if any issues pop up.
Thank you again for the review on this!
I’ve been using this latest implementation for the past day or two and haven’t noticed any issues. I think it’s ready for another look.
Yes, this is exactly what I was saying. And I believe that this now preserves the optimizations that have been made to this code. Thanks @claytonrcarter !