bytestring-trie icon indicating copy to clipboard operation
bytestring-trie copied to clipboard

Asymptotic complexity of keyed functions.

Open wrengr opened this issue 2 years ago • 0 comments

All the functions which reconstruct keys (traverseWithKey, mapBy, contextualMapBy, foldrWithKey, toList, keys, etc) suffer from a quadratic slowdown due to how we reconstruct the keys. I can make some changes to reduce the non-asymptotic factors; but so far as I can tell, actually fixing the bug is irreconcilable under the current design.

We could easily eliminate the slowdown by instead constructing a reversed variant of lazy bytestrings (reversed, because we need snoc-lists of strict bytestrings, whereas the standard lazy bytestrings are cons-lists). However, if the caller then converts those to standard bytestrings, doing so will incur the quadratic cost again. (Going from reversed-lazy-bytestrings to lazy-bytestrings is only quadratic in the number of chunks, whereas going to strict-bytestrings is quadratic in the length.) Thus, while this approach would technically solve our problems, it only does so by pushing the problem off onto the user, which is unacceptable.

I do, at least, have some ideas for how we could solve this by storing extra metadata in the trie. But so far I have no idea how great the overhead of that would be. Ideally, if the metadata can be made cheap enough to compute on the fly, then we could just compute it when we need it. Each call to the key-reconstructing functions would require two passes over the trie, but that's far better than the current approach. However, if the user wants to call these functions often, then it makes more sense to keep the metadata around; but that introduces the burden of updating it whenever changes are made to the trie. And if the cost of those updates is too great, then that pushes us towards forking the datatype so that users can decide which cost model they want —which I'd really rather not do, if it can be avoided.

wrengr avatar Dec 09 '21 06:12 wrengr