compact-sequences icon indicating copy to clipboard operation
compact-sequences copied to clipboard

doc request: sources

Open jwaldmann opened this issue 4 years ago • 7 comments

Hi.

I came here from your message https://mail.haskell.org/pipermail/haskell-cafe/2020-August/132593.html

Could you please add (in the README) a few words on this data structure(s) and their use cases? When should I prefer it over Data.Sequence (etc.)? (when space is important?) Link to the paper/talk that has this info? Does this idea exist in the literature already?

Thanks - J.

jwaldmann avatar Aug 13 '20 11:08 jwaldmann

I don't know if this idea exists in the literature or not. I haven't seen it myself, but that really means very little. As a non-academic, I'm not even sure how I'd approach the question. If you can help with that, please let me know. My Haskell Love talk starts just after 4:10 in this video; I expect the properly broken-out/edited version will go up sometime in the next week or so.

When/if the structure is valuable in practice remains to be seen. My best guess is that it will make sense in applications with light to moderate use of persistence. We'll really need to address #3 and perhaps #10 to get a really good idea of practical performance. The baseline versions here will have quite a lot of indirection and churn right at the tippy top of the structures (the first 2 to 4 nodes or so) that doesn't remotely pay for itself.

Data.Sequence is a much more flexible data structure, which will be far better if you need to do a lot of splitting or appending.

treeowl avatar Aug 14 '20 18:08 treeowl

One way to search the literature is to find any related work (in this case, Okasaki's thesis), and then look up papers citing and cited by it in Google Scholar, possibly with some keywords (e.g., "functional queue") to narrow the search.

From a very cursory search, the idea of arrays in powers of two is definitely out there, although I didn't see this exact configuration.

Lysxia avatar Aug 14 '20 22:08 Lysxia

@Lysxia, did you see any power-of-two arrays in related configurations I might want to take a look at?

treeowl avatar Aug 16 '20 20:08 treeowl

I'd be especially interested in anything that might point the way toward either o(n) space overhead in o(log n) time or a proof that that's impossible in the context of GHC Haskell.

treeowl avatar Aug 16 '20 20:08 treeowl

I can't guarantee you'll find something worthwhile, but two papers that caught my eye are

  • http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=BA916D23E3FFE7C508883C0CBC5CCEBC?doi=10.1.1.13.3451&rep=rep1&type=pdf

  • http://www.cphstl.dk/Paper/Strictly-regular-number-system/final.pdf

Lysxia avatar Aug 17 '20 00:08 Lysxia

I'll be somewhat surprised if the Kaplan and Tarjan paper proves relevant. My general impression is that their structures there are closely related to Okasaki's, but with much more complicated number systems to get worst-case bounds instead of amortized ones. Thanks for pointing out the other paper; I haven't encountered it before and I'll give it a skim.

treeowl avatar Aug 17 '20 01:08 treeowl

Ooh, the edited video is up!

treeowl avatar Aug 17 '20 21:08 treeowl