latexdiff icon indicating copy to clipboard operation
latexdiff copied to clipboard

Detecting moved sections

Open briochemc opened this issue 7 years ago • 7 comments

Is there a way to tell latex-diff to figure out when whole sections are moved around?

briochemc avatar Oct 30 '18 01:10 briochemc

I thought about this already for a long time but quite difficult to do in a useful way as unlike in Word there is no access to the editing process (which can sometimes be a good thing), and if, for example, a whole paragraph was moved, and then one or two words changed, it should still appear as a moved paragraph with some edits.

So for now probably not feasible, unfortunately.

ftilmann avatar Mar 30 '19 10:03 ftilmann

It could work on a per-paragraph basis, trying to find 1:1 mappings of the closest corresponding paragraphs and calculating the differences between them.

flying-sheep avatar Jan 14 '20 13:01 flying-sheep

Thanks for the suggestion. Still not so quick to do in practice (or do you know of an algorithm implemented in perl that does fuzzy differencing of tokenized text?). I have another idea how one could 'fake' such a functionality by looking for exact matches for added/deleted blocks of a certain length, which would probably work in many instances, but even implementing this requires changing several parts of the very core of latexdiff. So not something I will undertake any time very soon

ftilmann avatar Jan 16 '20 14:01 ftilmann

do you know of an algorithm implemented in perl that does fuzzy differencing of tokenized text?

Sorry, I looked into Perl once in 2009 and decided to learn Python instead.

Doing what you said won’t be any more or less fake than what any diff tool does, they’re heuristics by necessity.

flying-sheep avatar Jan 16 '20 15:01 flying-sheep

FWIW Found one implemented in JS here. Apparently the algorithm is called the Heckel method; read more here.

Or am I off-track?

apYdr6uxv avatar Jan 30 '20 14:01 apYdr6uxv

Thanks for leaving these hints. It looks like a promising approach but would replace the current diffing algorithm (at least optionally) and thus require quite a lot of coding to implement within the latexdiff context.

ftilmann avatar Jan 30 '20 14:01 ftilmann

Of course! I did not mean to imply it makes it any easier. 🙇

apYdr6uxv avatar Jan 30 '20 14:01 apYdr6uxv