compact-sequences icon indicating copy to clipboard operation
compact-sequences copied to clipboard

Representation tweak?

Open treeowl opened this issue 5 years ago • 8 comments

We don't actually need a lazy spine. An alternative representation for a queue, for example:

data N10 n a = N10 !(Array n a) !()
data N11 n a = N11 !(Array n a) !() !(Array n a)
-- () separates the front digit from the rear for clarity.
-- ...

data Queue n a
  = Empty
  | Q10 (N10 n a) !(Queue ('Twice n) a)
  | ...

This has a little more indirection and space overhead, so presumably the fundamental operations will be slower. But it also has some benefits:

  1. length becomes logarithmic time.
  2. We can index into the queue faster. While we have to pay off debits on all the nodes en route to the one we seek, we don't actually have to force them. This should win a nice constant-factor improvement, I believe.
  3. If I'm right about how efficient (still linear time) appending will work, I believe we should get better cache performance by waiting to force nodes until we've calculated out the shapes they're copied into.

treeowl avatar Aug 17 '20 08:08 treeowl

A compromise representation could keep the lazy spine but store its shape alongside, packed in a bit vector. That would speed length calculations and (I think) appends, without needing totally separate implementations of the basic operations.

treeowl avatar Aug 17 '20 08:08 treeowl

@Lysxia, what's your intuition on this?

treeowl avatar Aug 17 '20 14:08 treeowl

I may be missing some context. IIUC it seems the difference is not forcing the vectors when you force the spine.

  • How is length logarithmic time (and why was it not before?)

Lysxia avatar Aug 17 '20 14:08 Lysxia

length is currently linear amortized time because it has to force the whole spine, which may trigger array appends and splits. With this alternative, we can count how many arrays are in each level without forcing anything. So yes, your overall understanding seems right. But it's not just length, it's also shape, which looks important for appending.

treeowl avatar Aug 17 '20 15:08 treeowl

I feel like the access patterns this would benefit are too peculiar to be worth it. It's easier to say that random access, and operations like length that must have a view of the whole sequence instead of just the ends, lose the lazy amortization guarantees. Only eager amortization is left, so things might as well be forced indiscriminately.

Lysxia avatar Aug 17 '20 15:08 Lysxia

They don't lose the amortization guarantees regardless. Of the functions I've listed, only length could change its amortized bounds. For the rest, it's just about constant factors.

treeowl avatar Aug 17 '20 17:08 treeowl

Part of the thinking on this whole approach relates to your notion of debit provenance, which is a little more refined than Okasaki's debit passing rule.

treeowl avatar Aug 17 '20 17:08 treeowl

One yuck point to the wholesale change: we'd have to manage strictness very much "by hand". It's not terrible, but it's another place to make mistakes. On the plus side, if we ever want a worst-case version (for cache performance under certain assumptions, not total space), then we'll need something like that and more.

treeowl avatar Aug 18 '20 23:08 treeowl