delta
delta copied to clipboard
🚀 Highlighting of --word-diff output, and custom "squashed" diff output
Does delta support diff --word-diff
?
In a quick check - no. Would be awesome to add though.
Hi @ryanberckmans (and @AvnerCohen), thanks for this! Could you expand on exactly what you'd like to see here?
In a sense Delta does provide something like --word-diff
, in that it infers and highlights within-line edits (using a Levenshtein-type DP algorithm). However, I'm not saying that there wouldn't be benefit in working with git's native output, and also extending and improving the algorithms implemented by Delta (see #140).
Are you thinking of replacing git's default (and ugly) {+
and [-
markers with background colors? (Presumably in this mode, Delta's native within-line edit inference would be disabled).
_SUPPORTED_PYTHONS = [-['3.7',-]{+['3.8', '3.7',+} '3.6', '3.5', '3.4',[-'3.3',-] '2.7']
How about something like:
_SUPPORTED_PYTHONS = [~['3.7',]~{['3.8', '3.7',} '3.6', '3.5', '3.4',[-'3.3',-] '2.7']
With relevant color scheme applied as well ?
What would be nice is if the color is applied not across all lin, but just on the +- parts.
@AvnerCohen could you comment on how you see the role of delta's default within-line edit highlighting versus --word-diff
? For example, when would you use one versus the other? Do you have examples of cases where Delta's within-line edit inference is not doing a good job? (Delta's within-line edits (like GitHub, GitLab) are displayed on two different lines, unlike --word-diff
which combines the added and removed on a single line.) Note that you may want to play with --max-line-distance
to adjust Delta's within-line edit inference algorithm.
Again, I'm not suggesting we shouldn't support Git's --word-diff
output, I just want to understand how people view those two different, but somewhat similar, possibilities.
@dandavison Thanks for the time and effort.
First, I have checked back and seems like I got a bit confused, I actually meant "--color-words" and not "word-diff".
That being said, I have done couple of tests, and while I remember specific cases where color-words
was pretty helpful, I couldn't really find a specific case where this would be better than the correct default you suggest, with the specific part of the word highlighted, for me, it feels like it's just a matter of getting used to it.
@dandavison to answer your question on what I'd like to see here-
Simply put, right now delta doesn't seem to support --word-diff
. The output has no or incorrect highlighting.
Here's an example
In this screenshot, I'd expect the removals and additions to be styled according to the configured delta theme. Right now they are unstyled.
Thanks!
Thanks @ryanberckmans. I agree that that would make much more sense and that, whatever features delta offers, people will still send it --word-diff
output and it would be good for it to display it nicely.
Out of interest, do you have any comments on when you use --word-diff
versus relying on delta's within-line edit highlighting (below)?
git diff | delta --max-line-distance=0.6
@dandavison delta's within-line edit highlighting displays on two lines, whereas --word-diff
shows "just the words that changed" on a single diff line. For example,
^ delta, uses two lines
^ --word-diff
, uses one line
At the very least I'd love to see --color-words
and --word-diff
not confuse delta quite so much.
@dandavison I would like to have this feature and help coding it. Could you give me some guidelines where to start?
Hi @pilif-pilif, thanks for the offer! Let's start thinking about this then.
I think that we need to start off with some design thinking before getting to code changes. Basically, what exactly are we aiming for? One question is:
- Do we give users the option for the lines with
--word-diff
markers in them to be syntax highlighted by delta? Or do we forget about syntax highlighting and just aim to style the output appropriately?
Syntax highlighting might be quite hard: I think we would have to parse the input to reconstruct the original minus/plus lines of valid syntax in the target language, send those to the syntax highlighter, and then recombine them.
Honestly, after writing the notes below, I am starting to wonder about a different approach here: maybe we should just emit the diff --word-diff
output in raw
mode, so that the colors that git applies come through unchanged?
What are your initial thoughts about how you want this to behave when you are finished with it?
Here are some more notes. I'm just writing down my initial thoughts, so I am probably overlooking possibilities.
Here is some raw output from git --word-diff
, not using delta:
We see that
- Git has added some special markup to the code:
[-removed-]
and{+added+}
- Git is also emitting ANSI color escape sequences causing these to be displayed as red and green
Now, here is the same output, but using delta:
We see that
- Delta has removed the red and green from git
- Delta has syntax-highlighted the code as Rust
The [-
and {+
markers mean that this is not valid Rust syntax, and so the syntax-highlighting could have gone wrong. But in fact, in this case, it doesn't look too bad. In contrast, here's an example where the syntax-highlighting isn't quite what we'd want:
Hi @dandavison,
sorry for the delay.
I think that raw
for only the [-
and {+
part is the only real usable solution since it is a squashed diff. As a user I would still want syntax highlighting for the rest so i can better understand/see the Code where the change is happening.
When I use a squashed diff I want to see fast what changed where and in what environment/segment. So what
would be raw green, where
- raw red and environment
delta syntax highlighting.
so maybe delta can:
- stop removing the red/green ASCI color when there are
[-
and{+
markers - syntax highlighting treats
[-
and{+
markers as not valid syntax per default and changes nothing on the provided ASCI color
what do you think?
Hi @pilif-pilif, Two questions
-
Do you feel that you want syntax highlighting on the line containing
[-
and{+
itself, or would it be sufficient to have syntax -highlighting on surrounding lines but the line containing the word diff markers inraw
mode? -
Are you envisaging activating the new behavior via a new delta option, or are you hoping that delta will automatically detect the
--word-diff
markers? I'm not sure the latter is possible as (a) it may be valid syntax in some language, and (b) we'd have to parse the code to avoid being triggered by that pattern in strings/comments etc., (c) I don't think we have a nice way to interrogate the parent git process for whether--word-diff
is in effect (do we?)
syntax highlighting treats [- and {+ markers as not valid syntax per default and changes nothing on the provided ASCI color
The behavior here will depend on the sublime syntax definition for the language in question so I think all we can say is that the syntax highlighting behavior is undefined/unpredictable: it may change it, and as the example above showed a comment can confuse it.
So I'm not sure that I have a better suggestion than adding a new option to delta that causes it to emit the affected lines in raw mode throughout the entire line. Perhaps that new option is not specifically related to word-diff
but instead an optional regex that delta will test against each line and emit the line as raw if it matches (just thinking aloud, that might be a bad idea :) ).
Hi @dandavison,
1, I think it would be sufficient to have s.-highlighting on the surrounding lines only. That way the raw
line would be easy to spot.
- a new delta option would be ok
I want to check my logic - please correct me if I am wrong: (I know the ".." part is very abstract formulated)
-
git generates ASCII "colors"
-
delta gets "text" with ASCII "colors"
-
delta removes the ASCII part - with the new option delta would skip this step. or 3a. delta removes the ASCII part but before that saves the
raw lines
internally as per the new option -
delta sends only the "text part" to the syntax highlighter or 4a. delta sends normally the "text" to the syntax highlighter (nothing changes)
-
syntax highlighter "puts new ASCII" but leaves the ASCII from git per line or 5.a. if 3a then delta restores saved lines - that way syntax highlighting will not break.
-
Less displays everything.
can delta influence directly the syntax highlighter prosses?
I want to check my logic - please correct me if I am wrong:
Yes, that's basically accurate. As you say, delta saves a copy of the input line with ANSI color codes stripped and also saves the raw line with ANSI codes intact.
Then, in the main state machine parser loop delta calls handle_hunk_line
when its handling a "minus" (removed) or "plus" (added) or "zero" (unchanged) line.
I think that you are going to want to write a new version of handle_hunk_line
specially for this git-word-diff
feature. Here's the current handle_hunk_line
: https://github.com/dandavison/delta/blob/a4f5a4fa686a8db72afbb4f07ae906effbf0fe8e/src/delta.rs#L458-L525
As you can see it's basically a match statement handling the 3 cases: minus, zero, and plus. These are recognized by a '-'
, ' '
, and '+'
character at the start of the line respectively. But with --word-diff
, that's no longer true: instead of these 3 types of lines, you have two types of lines:
- Squashed-changed lines. This is a new sort of line that delta doesn't previously know about. They are recognized by having substrings like
[- ... -]
and/or{+ ... +}
. We want to emit these raw (look athandle_hunk_line
-- it shows how to emit a line as raw). - "Zero" (unchanged) lines. But with
git-word-diff
these don't start with a leading' '
character; they are recognized simply by not being squashed-changed lines. These you want to syntax-highlight (so look athandle_hunk_line
for that)
Here's the code location downstream where we see delta not syntax-highlighting line that should be emitted "raw" (the fact that is should be emitted "raw" is represented by the enum instance containing a non-empty value. In fact the data contained by the enum is the raw line characters.)
https://github.com/dandavison/delta/blob/a4f5a4fa686a8db72afbb4f07ae906effbf0fe8e/src/paint.rs#L456
So hopefully that helps a bit. I'm going to add some more thoughts in a separate comment.
I think that we should carry on thinking about how we can make this feature as good as possible. Maybe this question helps: what would your absolutely ideal way be for delta to display --word-diff
output, ignoring any implementation challenges?
Here are some more higher-level notes:
Delta already has two ways of displaying diffs: traditional, and side-by-side.
But git has a third style: what you are calling "squashed" (which seems like a good name). These are produced by git --word-diff
.
The aim of your work could be said to be two different things (which one do you think of it as?):
- To handle
git diff --word-diff
- To add the "squashed diff" style to delta.
If the aim is (2), there is a totally different approach we could take: we could forget about git diff --word-diff
entirely, adding an entry like this to the README:
delta does not support
git --word-diff
output. If you want this style of diff, use delta'ssquashed
option (or whatever it would be called).
Then you would add to delta the ability to squash the minus and plus lines into a single squashed line. That way the lines would already be syntax highlighted, you wouldn't have to worry about parsing/recognizing any weird git output containing [-
etc, and you could focus on how to style the removed and added parts of the squashed line.
In contrast, what we're talking about above is (1): i.e. having git do the squashing and delta learning how to parse and style that style of output.
If we are doing (1) then one thing that worries me is how will you handle someone using git diff --word-diff=color
? There the git output does not contain the [-
and {+
markers; instead removed and added material is indicated by color alone. Here's an example of that:
and for comparison here's normal git diff --word-diff
:
Perhaps we ignore the [-
and {+
markers and do the whole thing by recognizing the red and green colors in the raw git output? (And strip the [-
and {+
markers if they are present). And perhaps you could actually syntax highlight everything: red and green tells you what is removed and what is added, so you could reconstruct the original code, and send it to the syntax highlighter, and then reassemble it into a single squashed line, with the desired (background color) styling.
I think this gets back to my question above: in an ideal world, if we could just ask for any implementation of this and get it with no work, what would we ask for?
cc @th1000s just in case you have any thoughts: the topic here is git's --word-diff
output format, which merges sections of added and removed content together on a single line.
It seems to me that there are two related objectives:
(a) Handle git diff --word-diff
output from git
(b) Design how delta should visually present added and removed content together in a single "squashed diff" line
So far we have been talking as if the new feature would do (a) and (b) so that a user who enters git diff --word-diff
or git diff --word-diff
will get the single-line squashed output from delta, and this would be the only way to get single-line squashed diff output from delta.
But another way of approaching this would be to
- Implement parsing of
git diff --word-diff
output to delta's standard internal data structures ("minus" and "plus" lines, and "zero" lines) - Implement conversion of delta's standard internal data structures to a single-line squashed-diff output, somewhat like what git produces with
--word-diff
.
This way a delta user would be able to select single-line squashed diff output whether or not they had passed --word-diff
to git.
On a), handling the input: Detecting unified diff vs. word diff format is probably not that difficult, when a hunk contains no +/-
characters in column zero it (only probably) is word diff output. Or (platform dependent?) look at the arguments of the parent process. Then, parsing [-removed-]{+added+}
unambiguously can't be guaranteed either, which would leave grabbing the ANSI. That would mean users have to be instructed to use --word-diff=color
or --color-words
(or even =porcelain
). But the colors are not always red/green, e.g. git -c color.diff.old=yellow -c color.diff.new=cyan diff --color-words
.
And b), unless I misunderstand the issue, should just be the standard red/green background.
2.) Sounds much better, only the --word-diff-regex
setting from git would be lost. However, then configuring delta
as the pager for git would always use the word diff view. There should probably be a quick way to switch that on or off.
Thanks for linking me here @dandavison and apologies for my delay. As is often the case various things take priority over tinkering with tools but every time I get back to tools I am reminded that it's the foundation for my success.
I'm grateful for the ground that has already been covered on these topics here, and I would like to contribute from a few directions:
My personal experience with an independent diff input format
I fell down the rabbit hole of enhanced line diffs on the command line ever since I caught a glimpse of intra line highlighting via things like diff-highlight
(a perl script that comes with git? I've had it in my scripts repo in $PATH
forever), and github. In mid 2013 I began maintaining a python script that used diff_match_patch as a dependency to obtain character-wise diffs and quickly converted that (because it was slow) to a particular C++ STL implementation of diff-match-patch ("dmp
") which this py script would exec. As such, I never went deep with git --word-diff
. I think that git diff may well provide many of the same capabilities, I think there is even regex tokenization configuration available. git word diff can be used outside of git using --no-index
. It is merely happenstance that I didn't choose to use it and reinvented this wheel myself.
You may see in the above linked discussions #776 #785 where I have posted screenshots, my tool is fundamentally pretty similar to git word-diff
. All my python script sift
does is massage the raw protocol output from C++ dmp
into suitably-placed ANSI background/reverse codes.
One of the many deficiencies that I wanted to address with stuff like diff-highlight (and which I perceived on cursory evaluation of other options, and I might add this applies to github as well is that the results are sort of suboptimal in the sense that no effort is put toward bringing out a lot of the details in intra-line changes. For example please compare the quality of highlighted regions between
(assume the usual where + line is green and - line is red)
and
As you might imagine I have become rather reliant on these maximally concise diffs where there are several main optimal properties:
- no unchanged context content is ever duplicated, so context uses up the least amount of space possible
- to my knowledge this can produce intelligible output highlighting under many stressful scenarios, when alternative less intensive diff algorithms cannot succeed. Examples include:
- text dumps of SQL tables for double-checking transactions,
- automatic code formatting applied to code: confirming no functional code changes. You would think that something like
diff-highlight
can take care of this when only changing whitespace, but as already shown when more than one change exists on a given line, it comes up short, and as a general rule, line based tools including Github often completely come apart at the seams even when only whitespace changes are introduced, in particular when newlines get introduced or removed, - something I did the other day which was the comparison between a Jenkins consoleText output and a
tmux capture-pane -eJ
export, which inserts a large number of trailing spaces in each line and many small insignificant changes
So far I've never had a need for changing the behavior of this from doing a kind of brute force character-wise diff. I have not had a chance to more deeply explore git --word-diff
, i would want to adjust its output somewhat and wrap it still if I were to use it, for example added or removed newlines don't show well in the color
configuration.
Now computing a character-wise diff does increase computational load compared to regular diff implementations and I use it for almost, but not all, git diff viewing scenarios: I keep aliases around to use lesser, much faster diffs, and use them often when engaging in deep archaeology sessions because the runtime of dmp
appears in practice to be a bit worse than n*log(n) complexity (an area to investigate another time).
All of this to say, I have a unique perspective on this issue of "squashed-diff" output, I come bearing a third possible input to Delta, and am strongly in support of having flexibility of input for Delta so that I may apply it to my own needs. I want to be able to use all diff tools at my disposal in git, and I've described (unfortunately not very concisely) the delightful extent of the benefits inside and outside of git that can be had from dedicating a little more compute to diff calculation.
I feel that Delta has independently gone down the same path in this regard, to great effect and with a much more efficient implementation. When I consider choosing between using Delta or sift
with git diff
, there is usually no contest because Delta is a more pleasant experience in most cases, whether unified or side by side (quite the accomplishment I'd say). It is a great deal faster. But in many examples when only small syntactical changes are present, a large amount of actual output is produced, making quickly scanning and understanding the change more difficult than it has to be! A small amount of this is down to differences in comparison token boundary definitions, which I think are configurable and can be optimized further to reduce the amount of raw change characters highlighted (note, brute char-wise diff does go overboard finding useless matching patterns and causing garbled output, so it is not always better), a larger part of the extra content is due to duplication found in the minus- and plus- context lines. They're clearly helpful when changes become more complex, but they are also visually distracting when changes are simple.
So, the point I want to make (and finally coming back to the topic at hand):
word-diff is often more concise and readable
It's clear to me now that I have been driving Delta for a few weeks that its only weaknesses compared to my beloved sift
is verbosity. In the vast majority of cases, a change involves small portions of a few lines. The problem with Delta/github is that it will always use up two lines of context to show a change, no matter how small. Since I'm very used to it, I would be one of these squashed-diff users, so my two cents would be in support of enabling all the possibilities (threw in a --color-moved
related bullet but i think we should ignore that for this discussion as it is out of scope, but mentioning in case it is relevant in implementation):
- to be capable of identifying and interpreting different types of input diffs:
- vanilla ansi colored unified diff (already works, also works as raw stdin without ansi colors e.g.
diff -u ... | delta
) - word-diff in whatever flavors is most suitable for parsing, perhaps porcelain. Then I can edit
sift
to emit porcelain word-diff format and run e.g.sift | delta
for my non-git shenanigans for example as a way to perhaps input a diff suggestion (this feature might be ill-formed, sorry) - color-moved when enabled as default or zebra (they are the same) as I expect parsing dimmed-zebra may be nontrivial
- raw pair of input files (i'm not familiar with how it computes diff internally but i know it works)
- vanilla ansi colored unified diff (already works, also works as raw stdin without ansi colors e.g.
- to internally rebuild minus- and plus-lines in hunks and syntax-highlight each version of each hunk
- to have configurable output modes
- unified
- side-by-side
- squashed
On syntax highlighting of concise/squashed diff output
I am not sure if there was a consensus regarding whether/how to highlight the squashed diff lines.
I think the minus and plus sections themselves never have ambiguity in their syntax. They should just be highlighted per their computed values based on the minus- and plus- reconstructed hunks.
It is the context lines that have ambiguity. This is, after all, why I believe we need to consider even in the squashed-diff mode we're proposing, that we may still want to have a configurable mode where if the syntax in the unchanged context would be different that we should show the minus- and plus- unchanged context content. A simple example of this is if some lines become commented. They turn from code syntax to comment syntax. I know what I would like to do to view these scenarios is I would like to see both versions of diff, squashed to make it easier to see all of the chars changed only (i'd do it today with sift
!), and I'd like to also see it expanded so I can confirm the green sides all have commented syntax (today Delta crushes this task). So I argue this should be an option.
Misc. (somewhat off topic but might help someone who has gone off the deep end like me)
I've found it difficult ever since I discovered Delta recently, to toggle between 3 ways to view git diff,
- delta
- sift (its output getting sent to delta works fine it turns out -- delta knows not how to interpret it so it spits it back out verbatim, but I get whatever delta's been configured for for the header styles/decoration)
- vanilla (
git --no-pager diff --color=always | less
as a way to disable delta as pager) as a backup which always runs fast, which I will probably never use going forward now that I have delta
The issue I was running into was that if I configure an external diff (diff.external
), --color-moved
never gets applied. This makes a little bit of sense, but I think it is a bug, because I believe when --no-ext-diff
is specified it should behave as though no external diff is configured, but it doesn't do that, it seems to disable --color-moved
.
In conjunction with this, since this issue means that I cannot use a diff.external
because I do want to benefit from --color-moved
, I had a great deal of difficulty enabling sift
inside git log -p
which I do use a great deal.
I just did some testing and was able to find the workaround!
It is possible to use GIT_EXTERNAL_DIFF=sift git log -p --ext-diff
to enable an ext diff for log -p
. I think I have all the puzzle pieces for this again now. This makes me feel much better being able to have all the choices for the most capable tools at hand. But most ideal would be if one tool can be the best in all scenarios, that is the dream, one diff to rule them all!
The current release (0.10.3) refuses to handle --word-diff
or --color-words
at all, instead allowing the raw git output through. I think this is at least a step in the right direction since what delta was doing previously to that input was entirely invalid.
$ git diff --word-diff
fatal: invalid regular expression: [a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.eE]+i?|0[xX]?[0-9a-fA-F]+i?|[-+*/<>%&^|=!:]=|--|\+\+|<<=?|>>=?|&\^=?|&&|\|\||<-|\.{3}|[^[:space:]]|[�-�][�-�]+