delta icon indicating copy to clipboard operation
delta copied to clipboard

🚀 [syntax highlighting] improved language mappings via shebang

Open Kr1ss-XD opened this issue 5 years ago • 7 comments

Some tools (especially bat) check source files for a shebang line and if present use this to assign the according syntax rules. I'm wondering if this would be possible for delta, too.

Currently, a source file seems to be considered a specific language only if its name/extension can be mapped to a known language. Therefore, generically named files (e.g. executable shell scripts without *.sh filename extension) are not being syntax highlighted by delta. Considering a shebang could be an option in addition to filenames or --map-syntax.

Maybe it's even possible to use the algorithm/regexes which bat has already implemented ?

Kr1ss-XD avatar Jan 01 '21 15:01 Kr1ss-XD

Hi @Kr1ss-XD, the core issue here is that bat has access to the entire file, whereas delta (in its current form) only has access to the section of the file that happens to be in the diff hunk. It would be possible to change delta so that it (optionally) tries to find the file on disk (or from the git repo via libgit2). I have wondered from the beginning whether we would want to do that. Of course, the file might not even exist, since delta simply accepts whatever diff is given to it on stdin, which could be entirely fictional.

dandavison avatar Jan 01 '21 16:01 dandavison

Right, I'm aware that it's not as simple for delta as for bat which is given a filename as argument most of the time.

Since delta can recognize filenames in some cases, I wondered if it could utilize these to open the file and check its contents.

Of course, the file might not even exist, since delta simply accepts whatever diff is given to it on stdin, which could be entirely fictional.

I haven't considered this though.

Kr1ss-XD avatar Jan 01 '21 16:01 Kr1ss-XD

Since delta can recognize filenames in some cases, I wondered if it could utilize these to open the file and check its contents.

Yes, I agree, this would be possible. And as you say, for things like executable shell scripts, I think it's the only way forwards.

dandavison avatar Jan 01 '21 16:01 dandavison

This would be really nice, @dandavison.

Problem

I have a lot of e.g. Python scripts without the .py extension, and having a colorized Python-syntax-highlighted diff for these and this would be a game-changer.

delta should be able to auto-detect the language of files in the diff by parsing the hunk headers and running e.g. file on them.

This could be a step that is only run when the filename has no extension at all, so it shouldn't be computationally expensive. You can rely on file so there's no need to even parse the shebang line (which can get complicated).

$ file bin/my-issues
bin/my-issues: Python script text executable, Unicode text, UTF-8 text

$ git log -p bin/my-issues | delta -n
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
commit cde440bb4ea6ae2b957c6ba9fa59640c596af120 ┃
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━
Author: Zach Riggle <REDACTED>
Date:   Fri Jul 2 16:36:59 2021 -0500

    Add --entire-problem to my-issues


bin/my-issues
────────────────────────────────────────────────────────────────────────
<non syntax-highlighted diff>

Solution

I've created a ZSH script which automatically finds all of the interpreters known to file. You can find a copy of it here: https://gist.github.com/zachriggle/e82ba2b7f6ea55df853fab03c243876d

To save you the time of running it, here's the output on my system.

ash: Neil Brown's ash script text executable, ASCII text
awk: awk script text executable, ASCII text
bash: Bourne-Again shell script text executable, ASCII text
csh: C shell script text executable, ASCII text
ksh: Korn shell script text executable, ASCII text
luacore: Lua script text executable, ASCII text
node: Node.js script text executable, ASCII text
perl: Perl script text executable
python: Python script text executable, ASCII text
ruby: Ruby script text executable, ASCII text
sh: POSIX shell script text executable, ASCII text
stapler: Systemtap script text executable, ASCII text
tclassutil: Tcl script text executable, ASCII text
tclsh: Tcl/Tk script text executable, ASCII text
tcsh: Tenex C shell script text executable, ASCII text
zsh: Paul Falstad's zsh script text executable, ASCII text

You should be able to add these few bits to git-delta and use file to autodetect syntax, using the above as a mapping.

zachriggle avatar Jul 13 '21 00:07 zachriggle

@dandavison I created a simple solution to this issue, you should be able to use file on files without extension, and the above mappings, to automatically determine the syntax of a given file.

You may want to strip everything after the first comma (e.g. ASCII text or Unicode text).

zachriggle avatar Jul 13 '21 02:07 zachriggle

Thanks @zachriggle. One thing we should check before proceeding is whether there is a rust crate that already does this and looks to be reliable. Let me know if you're aware of anything.

dandavison avatar Jul 13 '21 09:07 dandavison

@dandavison There is syntect with a relevant function find_syntax_by_first_line. Thanks to @jplatte for pointing this out in a private conversation.

flxai avatar Feb 07 '24 11:02 flxai