sux-rs
sux-rs copied to clipboard
`EliasFano` is hard to use
I expected .get to be readily available after calling build() but that's not the case.
From the docs:
let ef = unsafe { ef.map_high_bits(SelectAdaptConst::<_, _>::new) };
I would have no idea what this does, and it feels weird to me that to use EF for anything useful I need an unsafe. Is the guarantee that high bits remain the same truely needed to prevent undefined behavior? If it's only needed to guarantee correct results, but the EF structure still works when provided 'wrong' bits, I think the unsafe could be dropped.
Also, the type that this produces is hard to spell: EliasFano<SelectAdaptConst<BitVec<Box<[usize]>>>>. As a user, SelectAdaptConst doesn't mean anything to me.
It would be more convenient to just add build_with_seqand build_with_dict and build_with_seq_and_dict methods or something along those lines, with add type aliases for their outputs.
(Also SelectAdaptConst sounds conflicting to me, but probably it's one aspect that's adaptive and another aspect that's const? Otherwise SelectConst is shorter/simpler.)
Also a .size() method would be nice, since the current estimate_size needs the generic parameters:
let size = EliasFano::<(), ()>::estimate_size(self.offsets.len(), self.sizes.len());
Ideally that size would also include the size for underlying rank-select structures.
So... yes, the problem is that when you map the high bits there's no way to guarantee that the bits are the same. So you could replace the high bits with something else that causes out-of-memory accesses, for example.
I think your idea of having build_with_seq, build_with_dict, and build_with_seq_and_dict with type aliases could be a good idea. The user would have simple types (we would fix the const parameters), and would resort to unsafe only to tweak things (e.g., using a different selection structure)
“Adaptive” refers to the fact that the selection structures adapts the subinventory elements from 16 to 64 bits, depending on the inventory span. The previous SimpleSelect implementation wasn't doing that.
But the goal here is to let the user have complete freedom on the implementation of EF. You could use a different selection structure—in fact, a different type of selection structure for zeros and ones. The selection structure does not even have to be part of sux.
I've tried to implement your suggestions in the repo. Can you have a look and tell me what you think? The documentation of the structure still need to be updated tho.
lgtm, if you can merge #51.
I suppose I'm not super familiar with the seq and dict interpretations, so in practice I'd probably just try whichever extention gives me the operations I need, but that's fine :)