helix icon indicating copy to clipboard operation
helix copied to clipboard

Prioritize shebang if no file extension is provided

Open Brixy opened this issue 2 years ago • 9 comments

Helix: compiled from master branch
OS: Void

Hi!

In my bin folder there are some single letter POSIX/Bash executables: c (calendar), p (password script) and f (fuzzy finder).

Although the correct shebangs are provided (#!/usr/bin/[sh,bash]) Helix recognizes the files’ languages as c, perl and fortran.

It is easy to solve this problem, e.g. by renaming the files and using the shells abbreviations or aliases, by adding [.sh,.bash] file extensions or by adding file-types to languales.toml like this:

[[language]]
name = "bash"
file-types = ["c", "f", "t"]
shebangs = ["sh", "bash", "dash", "zsh"]
# …

I just wanted to mention this in case it is a Helix issue.

Should helix generelly prioritize the shegang if no file extension is provided?

Thanks a lot!

Brixy avatar Feb 07 '23 12:02 Brixy

In general Helix checks for the shebang if it can't find a file-types entry that matches. The file-types entries though will match for filenames that match the exact file-type entry like c (C), f (Fortran) and t (Perl). We check for exact matches so that we can support full filenames like Makefile or go.mod. (Relevant code is here)

We could change the form expected for full filenames to something similar to what we use for suffix file-types (see https://docs.helix-editor.com/master/languages.html#file-type-detection-and-the-file-types-key)

[[language]]
name = "ruby"
file-types = [{ filename = "Rakefile" }, "rb"]

but this would be a breaking change for languages.toml configuration.

the-mikedavis avatar Feb 07 '23 14:02 the-mikedavis

Thank you very much for your detailed answer!

We could change the form expected for full filenames to something similar to what we use for suffix file-types … but this would be a breaking change for languages.toml configuration.

I cope well von one of the posted solutions; but I cannot judge whether a breaking change is justified ;-)

Please give me a hint if you want me to close this issue. Thanks a lot!

Brixy avatar Feb 07 '23 15:02 Brixy

I'll leave this open and add some tags to it. Since there's a workaround (the languages.toml snippet) and it's unlikely to occur most of the time, I don't think it's worth making the breaking change to configuration for now. Eventually we want to replace the TOML configuration with something more powerful and we might be able to use that change as an opportunity to clean up some config options that need to be tweaked like this.

the-mikedavis avatar Feb 07 '23 23:02 the-mikedavis

One option would be to change the format to:

file-types = [{ filename = "c" }, { extension = "c" }]

And then anything that isn't a struct is legacy compat.

askreet avatar Feb 09 '23 00:02 askreet

~~I think this edge case is rare enough that a breaking change isn't justified~~ (edit: I no longer think this, see comment below) But, if you are planning to change the configuration format, I'd suggest something like:

[[language]]
name = "git-config"
file-types = { filenames = [".gitmodules", ".gitconfig"], suffixes = [".git/config", ".config/git/config"] }

This way each string is explicitly listed as a filename/extension/suffix, unlike the current format where strings like "c" match both filenames and extensions, when it should really only match extensions. It's also more concise than file-types = [ { filename = ".gitmodules" }, { filename = ".gitconfig" } ], and if the one line gets too long, you can expand the table into:

[[language]]
name = "bash"
[language.file-types]
filenames = [".bash_login", ".bash_logout", ".bash_profile", ".bashrc", ".profile", ".zshenv", ".zlogin", ".zlogout", ".zprofile", ".zshrc", "APKBUILD", "PKGBUILD", "eclass", "ebuild", "bazelrc", ".bash_aliases"]
extensions = ["sh", "bash", "zsh"]
shebangs = ["sh", "bash", "dash", "zsh"]

howard36 avatar Feb 09 '23 09:02 howard36

I just hit this with a shell script named js (JIRA search). It had a shebang but was interpreted as a JavaScript file.

rcorre avatar Mar 03 '23 16:03 rcorre

I think syntax highlighting based on filenames is frequent enough to be a valid consideration:

Language Filenames
Shell .bash_aliases .bash_functions .bash_history .bash_logout .bash_profile .bashrc .cshrc .flaskenv .kshrc .login .profile .zlogin .zlogout .zprofile .zshenv .zshrc 9fs PKGBUILD bash_aliases bash_logout bash_profile bashrc cshrc gradlew kshrc login man profile zlogin zlogout zprofile zshenv zshrc
Ruby .irbrc .pryrc .simplecov Appraisals Berksfile Brewfile Buildfile Capfile Dangerfile Deliverfile Fastfile Gemfile Guardfile Jarfile Mavenfile Podfile Puppetfile Rakefile Snapfile Steepfile Thorfile Vagrantfile buildfile
Starlark BUCK BUILD BUILD.bazel MODULE.bazel Tiltfile WORKSPACE WORKSPACE.bazel
TOML Cargo.lock Gopkg.lock Pipfile pdm.lock poetry.lock
YAML .clang-format .clang-tidy .gemrc CITATION.cff glide.lock yarn.lock

I'm personally in favor for this breaking change, in a format similar to what @Plasma-Vortex suggested. I think Shebangs should take priority, followed by path suffixes, filenames and then extensions:

[[language]]
name = "bash"
[language.file-types]
shebangs = ["ash", "bash", ...]
suffixes = []
filenames = [".bash_aliases", ".bash_history", ...]
extensions = [".sh", ".bash", ...]

[[language]]
name = "git-config"
[language.file-types]
shebangs = []
suffixes = [".git/config", ".config/git/config"]
filenames = [".gitmodules", ".gitconfig"]
extensions = []

Implementation

I got the filenames in the first table from a YAML file maintained by Github:

Shell:
  type: programming
  color: "#89e051"
  aliases:
  - sh
  - shell-script
  - bash
  - zsh
  extensions:
  - ".sh"
  - ".bash"
  - ".bats"
  - ".cgi"
  ...
  filenames:
  - ".bash_aliases"
  - ".bash_history"
  - ".bash_logout"
  - ".bash_profile"
  - ".bashrc"
  ...
  interpreters:
  - ash
  - bash
  - dash
  ...
  tm_scope: source.shell
  ace_mode: sh
  codemirror_mode: shell
  codemirror_mime_type: text/x-sh
  language_id: 346

We can use these to help us write the default configuration file. Hopefully, the defaults will be rich enough that most users won't have to touch the configuration files at all.

I don't see this upgrade being an issue for end users, I think this is a relatively self-explanatory scheme to migrate to.

k12ish avatar May 05 '23 11:05 k12ish

I changed my mind after running into this issue myself (with a file named c), and I've updated my earlier comment to reflect that. I wouldn't mind introducing a breaking change for this (and if we plan to change filetype detection eventually, we might as well do it earlier, since the languages Helix supports will only grow over time)

I still think we should use the format described above, ~~and would be willing to implement it~~. Using Github's list of filenames and extensions sounds like a good idea.

Edit: Unfortunately I don't have the bandwidth to work on this. But if anyone wants to submit a PR for this, go ahead!

howard36 avatar May 14 '23 07:05 howard36

Just wanted to mention that this merged pull request(?) seems to have solved my initial issue.

EDIT: O sorry. I posted rubbish. I just forget about my workarounds. Still an issue. Sorry, guys!

Brixy avatar May 21 '23 21:05 Brixy

The issue with files named c, p or f was resolved with #8006 but file-types is still preferred over shebangs. IMO a shebang is a stronger hint that the file is in a particular language than the filename. Currently, systemd *.conf shell scripts are detected as HOCON despite the shebang.

Ordoviz avatar Jul 31 '24 09:07 Ordoviz