diff reorganize into smaller packages

I have been thinking about making a push to get this to v1.0. The main problem is the API. The current API is unwieldy: it is a blend of high level helpers and low level data structures, and doing anything useful with it requires a half dozen lines.

I propose to break this up into multiple packages.

The top level package (github.com/pkg/diff) will contain at most a few functions that do the most commonly requested operations, such as read two files and emit a unified diff. It may also contain a bunch of options that can be passed to those functions. The implementation will be mostly glue.

The subpackages will contain lower level data structures, algorithms, and helpers. These can be used by people who want fine-grained control over everything or who want to do something unusual (like me).

Here's a very rough, incomplete sketch of packages and exported funcs/types/vars, to give a flavor:

diff

Files Readers

diff/edit

Script Range Op

diff/myers

Diff Pair

diff/write

Options Color Names WriterToA etc

diff/unified

Write

diff/sidebyside (new, WIP)

Write

diff/header (new, WIP, names are just placeholder mnemonics for me)

Regexp LSP GNU

Opinions on this general direction, @mvdan?

Dec 29 '19 18:12 josharian

I've taken a stab at this in the reorg branch. It doesn't really make sense to make a pull request, since the diff is so large as to be inscrutable. Anyone who is interested/willing, please:

pull down the branch, and poke around a bit with godoc or go doc or the like
you can see the readme rendered at https://github.com/pkg/diff/tree/reorg
check whether your use of the module would be significantly nicer or worse as a result of this reorganization
read the commit message of https://github.com/pkg/diff/commit/a78b2397ab6b2815d9e61ad1019a8ed1736daf51, in which I agonize over an API design decision

Many thanks!

Dec 31 '19 00:12 josharian

I definitely agree with the idea here. I think the vast majority of users won't care about diff algorithms or writing the glue code themselves. And I agree that mixing high-level and low-level APIs in a single godoc will be messy, at least as long as we don't have a good way to define sections of documentation.

As for a single write package versus many:

One motivation for separate packages is that the ideal interface for different formats is different.

I'm not sure I follow. I'd only split a single package into many if it was very large in API or source code, or if the qualified names got significantly better, or if the APIs didn't belong together at all.

It doesn't seem like we clearly fall under any of those. I don't think it's clear that we should use a single package either, but when in doubt, I think one package is fine. If v1 comes and goes and we end up with a much larger write package, perhaps it's clearer then that we could use multiple packages.

Jan 07 '20 07:01 mvdan

I also think that separate packages for diff algorithms makes sense, even though separate packages for "writing" algorithms doesn't necessarily. A diff algorithm can get really complex, to the point that we might need diagrams or step-by-step examples to explain what the code is doing. I don't think we'd ever get to that point with an algorithm to render a computed diff for humans.

Jan 07 '20 07:01 mvdan