miette
miette copied to clipboard
Syntax highlighting support
Add support for syntax highlighting through syntect.
I figure this can be done by having the graphical handler take an optional SyntaxSet, and have an optional identifier token method as part of SpanContents
, so the renderer can color code accordingly.
The biggest challenge here is designing things such that things are appropriately feature flagged and don't grow binaries too much?
If you're looking for something a little less regexpy (TextMate grammars are basically large regexp soups if I recall correctly), you may want to look at https://lib.rs/crates/tree-sitter - I've had good experiences with it, it was just missing a couple grammars so I rolled my own crate to build them: https://github.com/fasterthanlime/tree-sitter-collection
(Hopefully that's useful input, if not feel free to collapse that comment!)
Oh hey no this is actually really cool and relevant! I wonder what the story would be with people writing their own custom ones 🤔
The biggest challenge here is designing things such that things are appropriately feature flagged and don't grow binaries too much?
I wonder if this is easy to be benchmarked. That might be really helpful
@zkat I made a very rough draft attempt at writing a custom SourceCode for integrating syntect, but I've been running into issues with getting the labels to render on the correct columns. I basically adapted the context_info
function from miette and added some logic to ignore ANSI escapes to try to trick the graphical reporter into doing the right thing, but it may not be possible without a new custom reporter or I may be misunderstanding something about what the SpanContents is supposed to represent.
My current test example looks like this:
Code is here: https://github.com/kallisti-dev/miette-syntect/blob/40f1a63b398e9c9cbda964076cd1d11e7b538c56/src/highlighted_source.rs#L195
I'm likely just doing something wrong here, but I'm not sure what.
I might try this as a PR to miette instead (with everything behind a feature flag) since that would make it easier to ensure it's doing the right things with labels, and also makes it possible to only run the highlighter on the actual part that's being rendered. my version highlights the whole source because the 'a
constraint on SpanContents data()
makes it difficult to reference anything that's owned by the SpanContents itself.
As a side note, you'd probably also want a smarter renderer because the default syntect::util::as_24_bit_terminal_escaped
inserts a lot of unnecessary escape codes. the rendering logic in bat is smarter and would be good to use as a reference.
Some more thoughts on this.
My initial idea was to try this as a custom SourceCode
, since that gives the user the ability to customize which syntax highlighting they use per-file rather than having the reporter try to figure it out. It also doesn't require an entirely different ReportHandler
beyond what ReportHandlerOpts
provides, which is convenient. However it looks like there needs to be a new ReportHandler
that's aware of ANSI escape sequences so that it can ignore them when handling label placement.
I suppose an idealized API would be that you set the SyntaxSet
and Theme
in the ReportHandler
options and then you set the SyntaxReference
on each individual SourceCode
using wrappers over the find_syntax_*
family of functions from syntect. But I'm not sure how you would pass the SyntaxSet
and Theme
to the SourceCode
when it comes time to render without an overhaul of the current design. I think it requires too many changes to miettes design for something that should be an optional feature.
Maybe the simpler approach in my example could work, where the SourceCode
type takes everything needed for highlighting, but we just augment the existing GraphicalReporter
to simply be aware of and ignore ANSI escape sequences when rendering?
I tried several approaches to only highlight the SpanContents
instead of the entire SourceCode
, but couldn't get anything working under the constraints of the SourceCode
and SpanContents
traits as they currently exist, so I resigned myself to just ANSI escaping the whole source code all at once.
A change to get this working would be a slight modification to the SpanContents
trait to allow something like fn data(&self) -> Cow<'a, [u8]>
so that it's easier to allocate new data containing the ANSI escapes.
Alternatively, changing SourceCode::read_span
to have a &mut self
receiver could also work because then the SourceCode
implementation can cache the syntect ParseState
and HighlightState
and the ANSIfied lines in a sparse map of the form IndexMap<usize, String>
. But I don't necessarily think this is the best choice, because it doesn't really make sense for the API to use a mutable reference here, and interior mutability with a Mutex
lets you workaround this...
...almost. The &'a [u8]
constraint in SpanContents::data
strikes again, making it impossible to generate any data via interior mutability that outlives the Mutex
lock. I think Cow<'a, [u8]>
would make all of this much simpler because I can simply return an owned Vec
of the ANSI escaped data in the SpanContents
rather than trying to figure out some way to create a slice with the same lifetime as the entire SourceCode
My thoughts on implementing this:
- the
SourceCode
trait should be extended with a new (optional) method:fn syntax<'a>(&'a self) -> Option<&syntect::parsing::SyntaxReference>
. - Change the
Line
struct ingraphical.rs
to include a newrendered: Option<String>
field, which will contain the post-highlighting version of the line. - Use the output from
highlight_line
to generate initial syntect stuff, then have logic on our end to convert itsVec<(Style, &str)>
output intoVec<(owo_colors::Style, &str)>
, and then apply that directly to the line, saving it asrendered: Some(line_with_styles_applied)
. This'll probably mean that we hard-code a mapping of one particular syntect theme to a generic "color" (non-RGB) mapping that will adapt across terminals. - Do all width/location calculation with
Line.text
, just like we do now. - Actually render
Line.rendered.unwrap_or_else(|| line.text)
when we're rendering lines themselves. - Keep the mutable
HighlightLines
state right intget_lines
. - (optional, more work?): allow configuration of scope highlights through the
ThemeStyles
struct. Probably just shove apub scopes: HashMap<String, Scope>
in there, whereString
is the scope name.
tl;dr I think all the actual highlighting logic should live entirely in GraphicalReportHandler
, but we can allow external configuration through an optional method in SourceCode
.
Quick update on this. I have a rough draft of this working locally. Using a similar approach to what you outlined above and keeping all the mutable state in GraphicalReportHandler
I'll submit a draft PR once I clean it up a bit.
Next steps are to design the API and document. It would be nice if the configuration API could wrap everything in syntect so the user doesn't need to include a syntect dependency for SyntaxReference or Theme, but I'm not sure what would be a sensible way to do this.
Making the choice of highlighter customizable with a trait would be nice as well, for tree-sitter support or for a more specialized highlighter. For my own use case, I really only need Rust highlighting so I would opt for a specialized parser/highlighter if I had the option to use one.
@kallisti-dev omg that looks amazing
Draft PR at #313
It ended up being easier to create a new global for the highlighter than it was to cram it into GraphicalReportHandler
, because doing so meant that it could no longer derive Debug
and Clone
, the latter of which it relies on in render_causes
. Not sure if that's the approach we wanna go with, but it certainly made things easier.
This adds a new syntect
feature flag, which pulls in fancy
automatically when used. It also automatically configures the global highlighter to use syntect by default, though the user can override the default settings (to override default theme, for example) with set_highlighter
.
There is also a generic Highlighter
trait. The goal with this design is to make the choice of highlighter customizable, and eliminate any exposing of crate-specific types in the API surface. I was using tree_sitter as a reference for what another syntax highlighting API might look like.
To make language detection easier, I added a name()
method to SourceCode
. This makes it possible for syntect to detect language via file extensions if the name corresponds to a file name (This would also pair well with a future SourceCode implementation for files such as #297).
I am also considering adding a language()
method to SourceCode, which would make the language choice explicit, similar to the syntax()
method mentioned above but agnostic of any particular highlighting crate. This string gets passed directly into SyntaxSet::find_syntax_by_name
so for example Rust, C, TOML, and TypeScript would all work. Right now this isn't exposed anywhere in the API for the user to configure, but I think it would make sense to add a with_language
method to NamedSource
to explicitly choose a language, as well as perhaps a source_code(language = "Rust")
syntax to the derive attributes which would only worked with a NamedSource
.
I think implementing language detection for tree-sitter might be a bit more of a challenge because it uses an opaque FFI type that you import from a language-specific crate (e.g. tree_sitter_rust
) to configure its language choice. So if we want to do tree sitter support it might be best to just let the highlighter struct itself choose how to detect language, by having a config hook such as FnOnce<&dyn SourceCode> -> tree_sitter::Language
and then just have the user supply that.
Here's what dbg!(syntect::highlighting::ThemeSet::load_defaults().themes["base16-ocean.dark"])
looks like:
https://gist.github.com/kallisti-dev/d43f4feb726ee15d50f7a15e7fb8c9d4
Maybe we could apply settings from GraphicalTheme
onto a custom syntect Theme
so that the user can customize it. It would also make supporting nocolor easier. Unfortunately it's a rather complicated structure so I'm not sure where to start.
https://gist.github.com/kallisti-dev/95ca7db90e6c635e0c3a67b858f39509
For anyone who wants to see what the default themes look like on their local setup, I've made a quick script to render a Rust hello world with all of them. You can run this locally on my branch to see how they render with your terminal settings.
Here's what it looks like on my screen, using Fira Code with VS Code terminal
I think a bolder color palette like base16-eighties.dark
or Solarized
would work well with the default miette theme.
@kallisti-dev I used a script to take screenshots of the example program with some different light and dark background themes:
IMO none of the syntect themes are readable across all the terminal themes I tested, with solarized probably being the closest.
I'm thinking the best thing to do might be to set our own rgb background color when using syntect. We'd also probably want to use a rgb theme for the rest of miette in this case, since the user's terminal theme isn't guaranteed to work with our background. It isn't great, but is probably the least-bad compromise until we get 16-color ANSI support with syntect.
theme testing script
This is very unlikely to work on somebody else's setup, but could probably be adapted to any linux wm without too much effort.
#!/usr/bin/env bash
set -e -u -o pipefail
out="$1"
themes=(
ayu-light
github-dark
github-light
gruvbox-dark
gruvbox-light-medium
ibm3270
mellow-purple
red-sands
solarized-dark
solarized-light
argonaut
ocean
)
mkdir -p $out
for theme in ${themes[@]}; do
nix run nixpkgs#theme-sh "$theme"
grim -g "$(swaymsg -t get_tree | jq -j '.. | select(.type?) | select(.focused).rect | "\(.x),\(.y) \(.width)x\(.height)"')" "$out/$theme.png"
done
montage "./$out/*" -mode Concatenate "$out.png"