zed icon indicating copy to clipboard operation
zed copied to clipboard

language: Select language based on longest matching path extension

Open claytonrcarter opened this issue 10 months ago • 4 comments

Closes #8408
Closes #10997

This is a reboot of my original PR from last year. I believe that I've addressed all the comments raised in that original review, but Zed has changed a lot in the past year, so I'm sure there will be some new stuff to consider too.

  • updates the language matching and lookup to consider not just "does the suffix/glob match" but also "... and is it the longest such match"
  • adds a new LanguageCustomFileTypes struct to pass user globs from settings to the registry
  • minor/unrelated: updates a test for the JS extension that wasn't actually testing what is intended to
  • minor/unrelated: removed 2 redundant path extensions from the JS lang extension

Languages that may use this

  • Laravel Blade templates use the blade.php compound extension
  • apparently Angular uses component.html
    • see also https://github.com/zed-industries/extensions/issues/169
  • hypothetically someone could publish a "JS test" extension w/ custom highlights and/or snippets; many JS tests use test.js or spec.js

Verifying these changes

I added a number of assertions for this new behavior, and I also confirmed that the (recently patched) Laravel Blade extension opens as expected for blade.php files, whereas on main it does not.

cc @maxbrunsfeld (reviewed my original PR last year), @osiewicz and @MrSubidubi (have recently been in this part of the code)

Release Notes:

  • Added support for "compound" file extensions in language extensions, such blade.php and component.html. Closes #8408 and #10997.

claytonrcarter avatar May 01 '25 00:05 claytonrcarter

Messages
:book:

This PR includes links to the following GitHub Issues: #10765, #https://github.com/zed-industries/extensions/issues/169 If this PR aims to close an issue, please include a Closes #ISSUE line at the top of the PR body.

Generated by :no_entry_sign: dangerJS against c87f7e266067333b01ecda58fa646901e7301952

zed-industries-bot avatar May 01 '25 00:05 zed-industries-bot

@maxbrunsfeld @osiewicz Nudging this, gently and politely. 😄 I've been using it w/o issue for the past few weeks, it will close 2 long standing issues, and staff have demonstrated interest in this feature/change several times in the past.

Input from @MrSubidubi would be welcome too, of course, esp as this change interacts with your recent changes in this area.

Thank you for taking a peek, if you're able!

claytonrcarter avatar May 13 '25 20:05 claytonrcarter

Small and potentially slightly off-topic coment:

I would possibly argue that in the long-run, glob patterns should also be available for language matching. There are some languages that would benefit from glob matches already and the ambiguity of file_suffixes is probably somthing you had to deal with a lot here. Generally, I think it might be safe to say that if a glob pattern matches, it might be a better match than just a suffix match and we could potentially also incorporate https://github.com/zed-industries/zed/pull/11697#discussion_r1605598630 more easily. Admittedly, I might also be missing something here, feel free to point that out should that be the case.

But then, I would guess that this is not something for now but rather something to keep in mind for later. With the tests you added, I think it would be easy to add this in later whilst ensuring that we do not regress on previous behavior.

Other than that, it does look good to me.

MrSubidubi avatar May 15 '25 08:05 MrSubidubi

glob patterns should also be available for language matching

Yes, I found myself wondering the same thing, but also felt that it was out of scope here.

Thank you for checking this out!

claytonrcarter avatar May 15 '25 11:05 claytonrcarter

Thanks for following up @claytonrcarter . Reviewing now; I might push some minor stylistic changes.

maxbrunsfeld avatar May 18 '25 17:05 maxbrunsfeld

@maxbrunsfeld Thank you for the review. I think I understood what you are asking for, but please correct me if I'm wrong:

My initial implementation was focused on comparing the length of the "pattern" that was being used for matching (ie the file extension or glob pattern), but this latest version is now storing the length of what that pattern actually matched (ie from the list of extension/filename/path in language_registry::language_for_file_internal().

An example:

  • extension 1 is looking for path_suffixes = ["php"]
  • extension 2 is looking for path_suffixes = ["blade.php"]
  • I open a file called templates/index.blade.php

Both extension 1 and extension 2 will be compared (in language_registry::language_for_file_internal()) against a list built from the file name, eg ["php", "index.blade.php", "templates/index.blade.php"]

In my previous version, I was comparing the length of the matched element from path_suffixes; so 3 for ext 1 (because "php" matched the filename) and 9 for ext 2 (because "blade.php" matched the filename).

In this version, we should be comparing the length of the matched element from language_registry::language_for_file_internal();

  • ext 1 will match the first element (because "php" == "php) and will score 3
  • ext 2 will not match the first element, but will match the second, (because "index.blade.php".ends_with("blade.php")) and will score 15

I hope that this is close to what you are asking for; please advise it it's not! The tests are passing on this, and I've tested it a little just now, but I'll put it through its paces tomorrow and see if any issues pop up.

Thank you again for the review on this!

claytonrcarter avatar May 19 '25 02:05 claytonrcarter

I’ve been using this latest implementation for the past day or two and haven’t noticed any issues. I think it’s ready for another look.

claytonrcarter avatar May 21 '25 12:05 claytonrcarter

Yes, this is exactly what I was saying. And I believe that this now preserves the optimizations that have been made to this code. Thanks @claytonrcarter !

maxbrunsfeld avatar May 26 '25 17:05 maxbrunsfeld