compact-sequences
compact-sequences copied to clipboard
doc request: sources
Hi.
I came here from your message https://mail.haskell.org/pipermail/haskell-cafe/2020-August/132593.html
Could you please add (in the README) a few words on this data structure(s) and their use cases? When should I prefer it over Data.Sequence (etc.)? (when space is important?) Link to the paper/talk that has this info? Does this idea exist in the literature already?
Thanks - J.
I don't know if this idea exists in the literature or not. I haven't seen it myself, but that really means very little. As a non-academic, I'm not even sure how I'd approach the question. If you can help with that, please let me know. My Haskell Love talk starts just after 4:10 in this video; I expect the properly broken-out/edited version will go up sometime in the next week or so.
When/if the structure is valuable in practice remains to be seen. My best guess is that it will make sense in applications with light to moderate use of persistence. We'll really need to address #3 and perhaps #10 to get a really good idea of practical performance. The baseline versions here will have quite a lot of indirection and churn right at the tippy top of the structures (the first 2 to 4 nodes or so) that doesn't remotely pay for itself.
Data.Sequence
is a much more flexible data structure, which will be far better if you need to do a lot of splitting or appending.
One way to search the literature is to find any related work (in this case, Okasaki's thesis), and then look up papers citing and cited by it in Google Scholar, possibly with some keywords (e.g., "functional queue") to narrow the search.
From a very cursory search, the idea of arrays in powers of two is definitely out there, although I didn't see this exact configuration.
@Lysxia, did you see any power-of-two arrays in related configurations I might want to take a look at?
I'd be especially interested in anything that might point the way toward either o(n)
space overhead in o(log n)
time or a proof that that's impossible in the context of GHC Haskell.
I can't guarantee you'll find something worthwhile, but two papers that caught my eye are
-
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=BA916D23E3FFE7C508883C0CBC5CCEBC?doi=10.1.1.13.3451&rep=rep1&type=pdf
-
http://www.cphstl.dk/Paper/Strictly-regular-number-system/final.pdf
I'll be somewhat surprised if the Kaplan and Tarjan paper proves relevant. My general impression is that their structures there are closely related to Okasaki's, but with much more complicated number systems to get worst-case bounds instead of amortized ones. Thanks for pointing out the other paper; I haven't encountered it before and I'll give it a skim.
Ooh, the edited video is up!