nanomorph icon indicating copy to clipboard operation
nanomorph copied to clipboard

implement Meyers diffing algorithm

Open yoshuawuyts opened this issue 8 years ago • 31 comments

So we have a slight diffing problem — not only are we fairly slow at updating, reordering nodes is also suboptimal. This means that some reorders can be surprising at the worst case, and in a worst case reorders simply take longer than they should.

In computer science this is known as "diffing", and is a rather well-studied problem. Probably the best known algo for this is the Mayers Diffing algorithm. It'd be neat if we could make use of this for nanomorph.

references

  • https://github.com/yelouafi/petit-dom/blob/master/src/vdom.js
  • https://neil.fraser.name/writing/diff/
  • http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.4.6927&rep=rep1&type=pdf
  • https://www.codeproject.com/Articles/42279/Investigating-Myers-diff-algorithm-Part-of

yoshuawuyts avatar Oct 10 '17 20:10 yoshuawuyts

@graforlock did you get a chance to look at this?

yoshuawuyts avatar Oct 20 '17 15:10 yoshuawuyts

Yeah @yoshuawuyts made some steps to extract it along with pre-optimisations that can be applied to nanomorph, but the newborn put me off the track :) Will probably progress & finalise it this following weekend if this attempt is successful and performant enough

graforlock avatar Oct 20 '17 19:10 graforlock

@graforlock ohhhhh, congratulations! :tada: :tada: :tada: :tada:

No rush at all; great to hear you've been successful, and like hope you're having a good time! :grin:

yoshuawuyts avatar Oct 20 '17 23:10 yoshuawuyts

Myers diffing for the dom is totally tractable. @graforlock looking forward to your implementation.

blahah avatar Oct 22 '17 21:10 blahah

Problems might be with numbers of pre-optimisations and we need quite a bit of them, that will appear in the forthcoming performance tests. DOM node getters and setters should be used sparsely/wisely as they are quite slow in bulk (vs comparing vals/keys on virtual objects), and when comparing real DOM trees - this needs special attention as any excessive and reckless use of aforementioned slows the computation down.

graforlock avatar Oct 22 '17 22:10 graforlock

@graforlock I've been thinking and reading about the problem and I actually think Myers is unlikely to be the best approach. It's good for the general diff problem, but for a DOM tree there are specialised algorithms.

screen shot 2017-10-23 at 00 03 38

http://referaat.cs.utwente.nl/conference/3/paper/7144/change-detection-in-xml-trees-a-survey.pdf

The DOM is a tree with ordered siblings, and we want the operations:

  • insert
  • delete
  • update
  • move

So from the list above, the DiffXML might be worth looking at.

blahah avatar Oct 22 '17 23:10 blahah

Myers is a groundwork for a lot of the (performant) other algorithms including some tree based diffs. Its also proven to do good work in many libraries that operate on trees. Yes, it is not specialised by default but there are specialised and well-optimized implementations at hand.

graforlock avatar Oct 23 '17 16:10 graforlock

I say try them all, but ship as iterations pass the test suite. Good news is we can try out different approaches in branches now that we have a baseline and make incremental improvements.

kristoferjoseph avatar Oct 23 '17 18:10 kristoferjoseph

@graforlock sounds good to me 👌

blahah avatar Oct 23 '17 22:10 blahah

@blahah or anyone, I struggle to find a whole lot of time lately, so I'm progressing slowly. Although I have good suites and understanding how to make it work well, we could pair up on that can't we?

I realise there isn't immediate rush but it would be cool if this was done anyhow.

graforlock avatar Oct 25 '17 19:10 graforlock

@graforlock maybe if you could create a repo, people could like dig in and help out? I'd love for this to make some progress — there's not a lot of pressure, but still, the sooner we'd get a version working the better :sparkles:

yoshuawuyts avatar Oct 26 '17 14:10 yoshuawuyts

yeah, a repo for a pure myers-diff-dom or something would be ideal, then we can collab in there?

blahah avatar Oct 27 '17 00:10 blahah

yeah will do, a good idea

graforlock avatar Oct 28 '17 09:10 graforlock

@graforlock yay! — looking forward to it! :sparkles:

yoshuawuyts avatar Oct 30 '17 12:10 yoshuawuyts

should do tomorrow if anything goes well, need to polish some edges and README-this-stuff for directions

graforlock avatar Oct 30 '17 23:10 graforlock

So, heres the repo @yoshuawuyts @blahah @kristoferjoseph

It is essentiallypetit-dom algorithm ported/adapted from VDOM to a DOM diffing methodology. A lot of code is similar.

A lot of stuff is also just opinionated result of the port (like input updates etc), but the algorithm itself (all the diff prefix methods for example) is totally extractable to the separate module. Sorry its all in one file atm! It has to be changed definitely!

It may be that I wasted huge amount of mine and your time and its all pile of crap! Or not... 😄

Preliminary results give me a huge performance advantage over nanomorph (as you can see in example)

https://github.com/graforlock/diffing-implementation

graforlock avatar Nov 01 '17 01:11 graforlock

So my thoughts are, after dealing with this stuff for some time.

Firstly, traversing the actual DOM tree for a tree diff is costly. Myers is super slow (with all the diff traceback it does by default) and won't perform well over certain operations cap. petit-dom is bypassing it by switching to a generic LCS algorithm, which is faster, but in nanomorph's case still too slow. Only makes performance difference with equal number of nodes, other than that, seems to have no edge over what nanomorph is currently doing.

There is a need for super performant and greedy LCS algortihm. I myself found a possible candidate but this is not that easy on the actual DOM tree without breaking the API. Proxy nodes is another thing to consider - their swap index has to be keyed in a reorder map given they would change their position in the DOM tree, remaining the same in the end. Not much of a walk in the park.

graforlock avatar Dec 04 '17 16:12 graforlock

hey all! popped up in my dash today so just dropping this in, in case it is at all relevant → https://github.com/WebReflection/domdiff

jongacnik avatar Dec 04 '17 20:12 jongacnik

Looks really cool, how's performance of it ? Did you try to measure it?

graforlock avatar Dec 04 '17 21:12 graforlock

@graforlock not yet, just a quick glance over source. Have a feeling you, @blahah, or @yoshuawuyts would be able to discern a bit more than myself!

jongacnik avatar Dec 04 '17 22:12 jongacnik

Good discussion about diffing: https://github.com/snabbdom/snabbdom/issues/348 Think it's worth mentioning here.

soyuka avatar Dec 10 '17 12:12 soyuka

not sure I'm adding anything to this discussion but for what I can tell with my tests, the petit-dom algo seems to be the fastest but there are too many operations that non virtual DOM wouldn't need, and compared to snabbdom is quite bigger, in terms of final size ( domdiff is 524 bytes in gzip and 471 bytes via brotli, as example ).

domdiff also wouldn't care less about patches and it works linearly with a list of nodes, it's not suitable for nested nodes too, just lists of the same parent node.

WebReflection avatar Dec 11 '17 19:12 WebReflection

Heya! :tada: Following this long time. And as i once mentioned, i once wanted to rewrite it from scratch - for learning, playing and minimizing the size. So code name was "minmorph" - removed the "./events" list, the remaining is almost the same with few optimizations. Actually not sure why we need to list all the events?

Long story short, back then i tested it against nanomorph's tests and only 1 tests was failing. Recently was touching it more and following the need for benchmarking, i got the @graforlock repo and put it there so we can benchmark. Also added the nanomorphs tests.

And so, the result is all of the normal tests are passing and only 8 of the "keyed" tests are failing! The benchmark results are phenomenal :) I'm going to create a clean repo with everything, plus @graforlock's code and benchmark. Then i'll PR the implementation so we can discuss and continue on fixing these 8 tests, because i'm not so familiar with the keyed things :D

The tests are green (almost) 2018-02-02-02 34 40_1280x1024_scrot

minmorph benchmark - numbers are the same as the @graforlock's not finished code, which in turn fails pretty much tests. 2018-02-02-02 35 04_1280x1024_scrot

Nanomorph benchmark 2018-02-02-02 34 53_1280x1024_scrot

tunnckoCore avatar Feb 02 '18 00:02 tunnckoCore

So, that's it https://github.com/tunnckoCore/demo-minmorph

yarn install
yarn dev

And open http://localhost:5000. It uses parcel-bundler to run on the browser btoh the tests and benchmarks. Tests are at the src/app/test.js and not in the test/ at the root. For trying the tests against nanomorph go to the file and pass nanomorph to the abstractMorph and abstractKeyed instead of minmorph.

One interesting keyed diffing https://github.com/joshrtay/key-diff (similar to domdiff).

tunnckoCore avatar Feb 07 '18 10:02 tunnckoCore

Cool stuff, I am going to have a look.

I myself also experimented with "mostly" accurate LCS algorithm (Sift4), and honestly I don't think we need proper 100% accurate diffing algorithms with backtrace provided we have keyed results reordering sorted out properly. Reason being that the time for calculating such accuracy is expensive and outweights simple removal / addition operations.

For example, I did extract basic Myers in agnostic fashion here: https://github.com/graforlock/myers-algorithm and this was literally super slow (expensive).

Sift seemed to be blazing fast and I may put some more work into that for greater good.

graforlock avatar Feb 07 '18 11:02 graforlock

Heya all! Just for the record of my work. Last night i started playing again on the above mentioned repo demo-minmorph. It's now a lot more organized and cleaned (reorganized and apply DRY on tests for #87), and deps are updated. Please check it out and try running it.

Tests are increased as number, failing tests are decreased to only 2 (5 assertions). Related to this is https://github.com/tunnckoCore/demo-minmorph/issues/14 ("no recursion step in domdiff") and thoughts from @WebReflection. Why tests are increased? They now are 724, because i switched to #104, now they are around 98 events (on chrome) instead of 30-35 hardcoded.

Only 8 of them are failing and are excluded from the suite

function excludeEvents(name) {
  // failing event tests for Minmorph
  // Nanomorph fails for another ones and some of these
  const isFailing = [
    'onresume',
    'onfreeze',
    'onreadystatechange',
    'onpointerlockchange',
    'onpointerlockerror',
    'onselectionchange',
    'onvisibilitychange',
    'onsecuritypolicyviolation',
  ].includes(name);

  return !isFailing;
}

Most of the keyed tests are passing, except 2 (handle non-id elements and nested without id) which i marked as TODO: PRs welcome for easier search.

Benchmarks (yarn bench) are against Nanomorph@6 beta using the @graforlock's benchmark from ... hm, from where i got it (some repo). It's not something so special, but it shows that my implementation using a bit changed domdiff is around 3-5 times faster than Nanomorph. I'm still not sure for that numbers, because of the failing tests.

Please share some feedback. I'll try to PR things one by one.

Cheers :clinking_glasses:

tunnckoCore avatar Jun 23 '18 04:06 tunnckoCore

Hi guys,

what is the status on that?

brucou avatar Jan 14 '19 18:01 brucou

@brucou hey! -- I've casually started working on a WASM version, but there's no ETA when this might be done.

yoshuawuyts avatar Jan 16 '19 12:01 yoshuawuyts

Alright, thanks for the info!

On 1/16/19, Yoshua Wuyts [email protected] wrote:

@brucou hey! -- I've casually started working on a WASM version, but there's no ETA when this might be done.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/choojs/nanomorph/issues/85#issuecomment-454764559

--

brucou avatar Jan 16 '19 16:01 brucou

@brucou the demo-minmorph is there, you can see performance and issues there ;] I didn't looked on it for a while.

tunnckoCore avatar Jan 17 '19 17:01 tunnckoCore