boltons icon indicating copy to clipboard operation
boltons copied to clipboard

remerge - dictionaries with arrays as values fail (slightly)

Open pleasantone opened this issue 8 years ago • 8 comments

I forked your gist to https://gist.github.com/pleasantone/c99671172d95c3c18ed90dc5435ddd57. The fork fundamentally adds no new code, other than extremely verbose debugging prints to simply record the problem without anyone needing to single-step through with a debugger.

The significant change in the code is the addition of 'zones': [{'a': 1}], to overlay.

which produces

{0: {'a': 1},
 'endpoints': {'cache': {'host': '127.0.0.2', 'port': 8889},
               'persistence': {'host': '10.2.2.2', 'port': 5433}},
 'host': '127.0.0.1',
 'overlay_version': '5.0',
 'port': 8080,
 'zones': <Recursion on dict with id=140392218983688>}

The output gets interesting right here:

path=(), key=zones, value=[{'a': 1}]
new_parent=[], new_items=<enumerate object at 0x7faf9c57e0d8>
updating new_parent to ret={'host': '127.0.0.1', 'endpoints': {'cache': {'host': '127.0.0.1', 'port': 8889}, 'persistence': {'host': '127.0.0.1', 'port': 8888}}, 'port': 8000}
KeyError
returning new_parent={'host': '127.0.0.1', 'endpoints': {'cache': {'host': '127.0.0.1', 'port': 8889}, 'persistence': {'host': '127.0.0.1', 'port': 8888}}, 'port': 8000}, new_items=<enumerate object at 0x7faf9c57e0d8>
path=('zones',), key=0, value={'a': 1}
new_parent={}, new_items=ItemsView({'a': 1})
KeyError
returning new_parent={}, new_items=ItemsView({'a': 1})
path=('zones', 0), key=a, value=1
new_parent=1, new_items=False
KeyError
returning new_parent=1, new_items=False

new_parent should not be {}

pleasantone avatar Jun 15 '16 00:06 pleasantone

Sorry for the delay, but I think I got it. I love a good bug like this; there's just one right answer and it's a tiny change. I forked your fork of my gist and added to the printing to get the answer. That linked version, as well as the original gist, both have the fix.

So, first of all, the bug is in fact in remerge, not in remap or get_path. That was a relief.

You were very close to the right answer. The critical part is around new_parent=[] where new_parent gets set, updating new_parent to ret={...}. This is how 0 ends up as a key in the return. The enumerate object should be going into a new_parent that is a blank list, not the root dictionary.

So -- SPOILER -- the fix was to tighten the condition around when the new_parent gets set to ret. Turns out it's not just that the path is Falsey, but also that the key is None.

I'll be the first to admit remerge is still not perfect. This has re-opened my eye to the issue of "what if None is an actual key in the dictionary." But that is a bug I'll have to think on some more.

Does all that make sense? Hopefully this was more fun than fumble. :)

mahmoud avatar Jun 15 '16 10:06 mahmoud

Very cute! Thanks for nailing it, the recursion was driving me a little crazy.

pleasantone avatar Jun 15 '16 19:06 pleasantone

Absolutely! It should be mentioned that lists are a little funky with this. The default remap always extends lists, adding items, and this remerge updates items fully recursively (it doesn't stop at a list boundary). So what you get with multiple zones is update + append, which yields pretty weird results, I must admit. It still works well for lists of scalars (strings, ints, etc.), though.

mahmoud avatar Jun 15 '16 19:06 mahmoud

Actually, there's still an issue, when you make zones [{'a': 1}, {'b':2}] we get:

'zones': [{'a': 1}, {(1, 2), (0, 'b')}]}

pleasantone avatar Jun 15 '16 19:06 pleasantone

Yeah, I think the behavior should be to extend the lists, as opposed to a list of lists.

pleasantone avatar Jun 15 '16 19:06 pleasantone

Whoa, that's not expected. I was getting [{'a': 1, 'b': 2}, {'a': 1, 'b': 2}] in my test (updated + appended). That's probably not the behavior you want either. May need to take a step back and write some example cases.

So the one we just solved:

base = {'zones': [{'a': 1}]}

overlay1 = {'zones': [{'a': 1}]}

>>> remerge([base, overlay1]) 
{'zones': [{'a': 1}]} 

And now the one we need to solve:

overlay2 = {'zones': [{'b': 2}]}

>>> remerge([base, overlay2]) 
{'zones': [{'a': 1}, {'b': 2}]}  # desired

See I think these behaviors are mutually exclusive. One updated in-place, and the other appended. If we were to make it behave like the latter, then the result would become [{'a': 1}, {'a': 1}]. While the {'a': 1} dictionaries may look the same to us, remap/remerge has no way of knowing that. So, what to do?

mahmoud avatar Jun 15 '16 19:06 mahmoud

Well after a detour with CalVer, I remembered this thread and decided that yes, the additive lists are almost certainly the best default behavior in this case. So, the gist is updated.

Thank you again very very much for helping me work/think through that case. It may not be perfect, but I hope it's useful. :)

mahmoud avatar Jun 24 '16 22:06 mahmoud

@mahmoud Do you plan to merge (something like) this into boltons?

Recursive merge is such a many-sided concept. Instead of fixing a certain behavior (that will suffice for 80% of people but fail for 20%), it would be better to allow configuring the behavior (and choose sensible defaults):

  • Basic: Merge or overwrite keys?
  • Advanced: List merge strategy (overwrite, additive merge, merge objects)?
  • Whatever may arise

dpath library supports configuring behavior through flags (see: https://github.com/akesterson/dpath-python/blob/master/dpath/util.py#L161). It's simple and seems nice.

tuukkamustonen avatar Sep 25 '16 10:09 tuukkamustonen