reflex icon indicating copy to clipboard operation
reflex copied to clipboard

`diffIntMap`, or `diffAlign` more generally

Open endgame opened this issue 6 years ago • 8 comments

I recently found myself looking at Data.Map.Misc.diffMap and wanting something similar for IntMaps.

There is nothing in the implementation of diffMap that requires it work over Map k v only: I think you can generalise it to: diff :: (Align f, Filterable f, Eq a) => f a -> f a -> f (Maybe a) and similarly for diffNoEq.

I'm not sure of the best place to put this, but I'm happy to write it. Current thoughts: under reflex at Data.Align.Misc or Data.Align.Diff, or under a new semialign-diff package* under Data.Align.Diff. reflex still depends on these < 0.9, so we'd need to backport it into reflex anyway (semialign split off from these at these-1).


* Neither semialign nor these depend on each other, so a package that depends on both becomes necessary.

endgame avatar Jun 29 '19 05:06 endgame

https://github.com/reflex-frp/reflex/issues/301 So do note that we do already have Semialign instances. It might be possible to add a Cabal flag for older or newer these? This is gross but at least it's the same modules (I think?) just from different packages, so no new CPP will be needed?

Not saying this is the right approach, since it is more surface to maintain. Just wanted to point out it is possible.

Ericson2314 avatar Jul 09 '19 20:07 Ericson2314

Thanks for your thoughts. I haven't thought much about this as my laptop died recently, but I was wrong in my initial issue: semialign depends upon these, so I think the best thing is for me to try and PR a new Data.Semialign.Diff module there with a diffAlign function and two diffAlignNoEq variants (one to count These as a change, and one to not).

Once they're in a public version of semialign and reflex is patched to handle the newer packages, we can alias Data.Map.Misc.diffMap = Data.Semialign.Diff.diffAlign with a deprecation notice and roll on from there.

endgame avatar Jul 09 '19 21:07 endgame

I remember the problem. Neither semialign nor filterable depend on each other. I have filed https://github.com/isomorphism/these/issues/115 to find out whether I can add the filterable dependency and put this in semialign. If not, I'll probably set up a semialign-diff.

endgame avatar Jul 12 '19 23:07 endgame

I have filed isomorphism/these#116 , which provides the generic versions of diff (generalised diffMap), diffNoEq (generalised diffMapNoEq) and patch (generalised applyMap; I think the asymptotics for patch are the same as applyMap but it might be slightly slower as we can't use Map.union and Map.difference there).

endgame avatar Jul 13 '19 08:07 endgame

The PR on isomorphism/these was closed as witherable adds a transitive dependency on lens. I'm playing with these ideas in https://github.com/qfpl/semialign-diff .

I don't yet have a good story for patching indexed diffs and I'm going to be pretty busy for the next week or so, so I won't be imminently releasing to hackage.

endgame avatar Sep 08 '19 12:09 endgame

I've had a bit of a chat to @ryantrinkle on freenode#reflex-frp about some of this (I'm jackdk on IRC), but I need something longer-form to explain everything, and the tradeoffs involved.

The Problem

I want to generalise this function from Reflex's Data.Map.Misc:

diffMap :: (Ord k, Eq v) => Map k v -> Map k v -> Map k (Maybe v)
diffMap olds news = flip Map.mapMaybe (align olds news) $ \case
  This _ -> Just Nothing
  These old new
    | old == new -> Nothing
    | otherwise -> Just $ Just new
  That new -> Just $ Just new

Note that we're walking the map "twice": once for the align and once for the Map.mapMaybe, though laziness will interleave them, I think. My goal is to make the generalisation no worse than this.

The Map.mapMaybe could well be the generalised Filterable#mapMaybe (from the witherable package). The naive diff :: (Filterable f, Semialign f, Eq a) => f a -> f a -> f (Maybe a) gives really strange results for f ~ [] (and possibly others). The problem is that the Semialign/Filterable generalisation requires that the record of changes uses the same f as the structures being compared. Note also that the kinds involved mean that this will not work for DMap. That's a non-goal. I'm also not trying to replace the Patch typeclass.

Diffing with a Fold

If we instead fold over the result of aligning the structures to be compared, we get the diff I'm currently most happy with:

diff
  :: forall p f a .
     ( FoldableWithIndex (Index p) f, Semialign f
     , Eq a
     , AsEmpty p, At p, IxValue p ~ (Maybe a)
     )
  => f a -> f a -> p
diff = diffWith $ \case
  This _ -> Just Nothing
  That new -> Just $ Just new
  These old new
    | old == new -> Nothing
    | otherwise -> Just $ Just new

diffWith f = (ifoldr step Empty .) . align
  where
    step k = set (at k) . f

This lets me do nifty things like diff two [a]s into an IntMap a, which I can't do with the Semalign/Filterable version, and makes the silly diffing into [Maybe a] impossible, as there's no At instance for [a].

Efficiency

I showed this version to @ryantrinkle on freenode#reflex-frp, and he said:

now, if you could also extend align so that it can support these operations: http://hackage.haskell.org/package/containers-0.6.2.1/docs/Data-Map-Merge-Lazy.html then i think you plausibly have a general way of doing diffing and patching of functors with possibly ideal asymptotics

The most general function in that toolbox is mergeA, which is (morally, if you handwave through the WhenMissing and WhenMatched types something like:

mergeA
  :: (Applicative f, Ord k)
  => (a -> f (Maybe c))
  -> (b -> f (Maybe c))
  -> (a -> b -> f (Maybe c))
  -> Map k a
  -> Map k b
  -> f (Map k c)

You can generalise this to use (Witherable t, Semialign t) instead of Map k, but to be able to access all the shortcuts available to the Map implementation I had to put together a pretty monstrous typeclass: https://github.com/qfpl/semialign-diff/blob/2228281011ca0f4fe815ab224a5981f212ca5d9b/semialign-merge/src/Data/Semialign/Merge/Indexed.hs

(Remark: @gwils has been poking me in the back for about a week now, saying "stop this madness!". If you think class MergeWithIndex was bad, the dead-ends were even worse.)

There are a few things to note:

  1. If you want the Merge/MergeWithIndex classes to witness the isomorphism between WhenMatched and the functions it represents, you can't make Map k an efficient instance of Merge, using the tools in Data.Map.Merge.Lazy (which I don't have right now, but assume it's MergeWithIndex, without the index). This is because the WhenMatched and WhenMissing in Data.Map.Merge.Lazy represent functions that accept a k argument.
  2. There aren't that many instances for such a niche class. There's Maybe; Map k and IntMap from containers; maybe HashMap from unordered-containers (but that doesn't have a merge interface (yet?)); the monoidal-containers wrappers for the maps; Product and Compose probably; [] (which won't be that much of a win); Seq; Vector; and... that's about it.
  3. Any diff you build on top of Merge or MergeWithIndex is going to do silly things for []. I don't like that.

Where Next?

I think the version of diff that does align-then-ifoldr is the best I've found. It's still doing two "walks" over the structure, so we're no worse off than diffMap. It has types that describe the effect of diffing much better than align-then-mapMaybe, and good symmetry with a patch that folds over the patch (which is likely to be small).

It does mean that we lose the ability to diff Event t as into an Event t (Maybe a), but I'm not sure that's a great loss.

Is semialign-diff actually going to be a useful package? Will its diff even have a chance of replacing diffMap in reflex?

There's another option, which is to make a semialign-merge package, but instead of trying to generalise all the high-performance Map-specific stuff, instead witness the useful combinations of Semialign and Functor/Filterable/Witherable:

merge
  :: Semialign t
  => (a -> c) -> (b -> c) -> (a -> b -> c)
  -> t a -> t b -> t c

mergeA
  :: (Applicative f, Semialign t, Traversable t)
  => (a -> f c) -> (b -> f c) -> (a -> b -> f c)
  -> t a -> t b -> f (t c)

mergeMaybe
  :: (Semialign t, Filterable t)
  => (a -> Maybe c) -> (b -> Maybe c) -> (a -> b -> Maybe c)
  -> t a -> t b -> t c

mergeMaybeA
  :: (Applicative f, Semialign t, Witherable t)
  => (a -> f (Maybe c)) -> (b -> f (Maybe c)) -> (a -> b -> f (Maybe c))
  -> t a -> t b -> f (t c)

imerge
  :: (FunctorWithIndex i t, Semialign t)
  => (i -> a -> c) -> (i -> b -> c) -> (i -> a -> b -> c)
  -> t a -> t b -> t c

-- imergeA, imergeMaybe, imergeMaybeA all similar.

With that in play, semialign-diff can then provide diffFold and diffMerge, that build up the patch in different ways, depending on the typeclasses you have at hand.

Note that all of these approaches are still walking the structure twice, so we're not really worse off compared to where we started. I just don't want to build something that won't get used, so I'm looking for a sounding board.

endgame avatar Oct 24 '19 07:10 endgame

I just released semialign-extras, which provides a generalised diff for semialigns: https://hackage.haskell.org/package/semialign-extras-0.1.0.0/docs/Data-Semialign-Diff.html#v:diff

A rewrite rule could potentially replace calls to diff with optimised ones using the map interface, but I haven't been able to make that work: qfpl/semialign-extras#3

endgame avatar Nov 15 '19 02:11 endgame

@ryantrinkle If you know someone who is good at rewrite rules, this would get me the generic diffing I wanted, and the faster map operations that you wanted.

endgame avatar Nov 15 '19 05:11 endgame