difftastic icon indicating copy to clipboard operation
difftastic copied to clipboard

Create and publish a library crate

Open petr-tik opened this issue 3 years ago • 7 comments

Hey Wilfred, Thanks for such a great command-line tool.

I was wondering if you would consider publishing a library with your work to provide an interface similar to dissimilar's diff https://docs.rs/dissimilar/1.0.3/dissimilar/fn.diff.html

It would really help people writing tests that calculate and highlight diffs to have AST-powered diffs. I am thinking rust-analyzer and the myriad of text editors being implemented in rust atm.

Appreciate all your work here - thanks again

petr-tik avatar Mar 30 '22 22:03 petr-tik

I would like to see a ‘assert_eq’ macro

cameronbraid avatar Apr 03 '24 01:04 cameronbraid

Hello, sorry to bump this issue but it would indeed be a great idea. Any idea on the advancement of this ?

I would like to create new git tooling with git2 crate and difftastic would immensely help to get better diffs. As the code is already in rust, having it as a crate would help rather than hack stuff around sub-processes etc...

Also, it would help people to maybe create bindings to like python and stuff.

Thanks in advance, Have a nice day :)

ierezell avatar Nov 27 '24 16:11 ierezell

crate-ify please! Merry Christmas!

Sajjon avatar Dec 22 '24 20:12 Sajjon

Would a GoFundMe help with this issue?

skewty avatar Jun 08 '25 04:06 skewty

Wow, a lot of people have expressed an interest in this feature!

It's really hard to figure out the right API to expose as a library. I'm concerned that adding this will make it harder to work on difftastic (cargo workspaces would complicate the already involved release process, granular API visibility would now matter, internal refactorings could now become breaking changes). I don't want to do this unless people actually use the library API.

I'd really like to see some usage feedback of the JSON output, to confirm that the current design makes sense for external consumers.

$ DFT_UNSTABLE=yes DFT_DISPLAY=json difft before.js after.js

(No-one has complained yet about DFT_UNSTABLE being required, so I'm not sure anyone is actually using this output option right now.)

Open Design Questions

(1) Should the API take &[u8] arguments or &str? &str seems better, but this forces consumers to reimplement the text encoding detection that difft does.

(2) Should library users pass the filename, or explicitly specify the language expected?

(3) Alternatively, should users pass a parsed tree-sitter &Node? If the library handles parsing itself, what happens if the file is too big to parse or produces tree-sitter parse errors? If the caller provides the &Node, how does the library expose the language-specific diffing settings (see TreeSitterConfig in difftastic)?

(4) What about line-oriented diff fallbacks? Difftastic falls back to line-oriented diffs if it can't do an AST diff in a reasonable amount of time/memory. The library API would need to expose this.

(Difftastic can be very slow and using >100MiB of RAM is unfortunately common due to quadratic memory usage on normal sized programs.)

(5) How do you expose syntax highlighting information? This information needs to be exposed too, you definitely want the AST used during diffing to exactly match highlighting, or it'd be extremely confusing.

(6) What about blank lines? An AST diff doesn't see them, but when rendering a diff you want to align them as much as possible. Difftastic does this during the display process (see match_preceding_blanks in context.rs).

Summary

All these issues are totally surmountable, but I'm concerned that an accurate API will discourage usage due to the inherent complexity of AST diffing. I don't want to add something that makes my life harder whilst being largely unused.

Please play with the JSON API and see how it works for you, and then we will have some concrete experience to build the library API :)

Wilfred avatar Oct 29 '25 23:10 Wilfred

A little feedback from me:

1.Maybe impl AsRef<[u8]> for str and always use bytes? That way users should be able to pass both - str and bytes. 2.My suggestion is to use Option / optional builder method that allows to specify language override. Simultaneously I suggest filename to be required to pass and detect default language based on this. 3. IMHO it's a low-level feature. Nice to have, but basic functionality should cover passing bytes with some minimal context. 4. I don't have opinion on that. 5. Builder method / bool field that enables syntax highlighting? 6. Similar to 4. I don't have opinion on that

Dzordzu avatar Nov 04 '25 12:11 Dzordzu

So I can actually give some feedback here I think. I have been using a patched version of difftastic for about a year now to render HTML diffs for my own little git-viewer thingy:

https://git.rlnm.net/libs/linux/commit/2527a16226d20795007acbdfc1fd474e05a945a0/changes/9727ab0ddb7b1c5ae551a432fc2205b312bc2d74

The main reason I patched difftastic was to get point 6) to work and output alignment info for each chunk of changed lines. Otherwise it appeared to be impossible to reconstruct a complete diff, as I could never quite align everything correctly. And the reason I never sent a patch was that I did so very hackily and while the result works it definitely isnt pretty.

To your other points:

  1. This highly depends on the amount of info the library would output. If we get all the information to reconstruct the full file, including all unchanged lines, the most flexible input would probably be a Read or &[u8]. If (as it currently happens), we only get changed segments, we would probably need to input lines as &[&str] to ensure display and diffing code agrees over where linebreaks are and what the encoding of the content is.
  2. I'd probably prefer to pass a filename if we also pass a bag of bytes. Then the API is analogous to passing file contents and file metadata and the consumer does not need to deal with interpreting the file contents at all.
  3. Exposing a way to directly pass a tree-sitter &Node instinctively sounds nice, but I would consider this something that could be added later, especially if that would require exposing additional parsing settings. Its also not needed for my usecase
  4. Its helpful if line-based fallbacks work automatically. Though I would love to be able to tweak the memory/time limits as I cache the diff output so would be fine with giving the library some more time and memory to find a solution.
  5. This is actually an interesting point. I currently just run syntax highlighting using syntect over the text and then highlight the changed parts based on the output difftastic provides me. This works well enough, even if it is somewhat finnicky, so I'm not sure if syntax highlighting needs to be a concern for the library.

UhhhWaitWhat avatar Nov 28 '25 20:11 UhhhWaitWhat