Is it possible to implement Search/Seek methods?
Is it theoretically possible? Example:
comp := intcomp.compress([10,20,30])
// get number by value
incomp.Search(comp, 10) // 1
incomp.Search(comp, 20) // 2
incomp.Search(comp, 30) // 3
// find value >= v
incomp.Seek(comp, 8) // 10
incomp.Seek(comp, 10) // 10
incomp.Search(comp, 17) // 20
incomp.Search(comp, 21) // 30
incomp.Search(comp, 31) // -1
Perhaps, it would be helpful to have an iterating method (like (sync.Map).Range), so that reslicing, searching, seeking, etc... is possible in streaming way (without allocating a full decompressed list). Though, performance might be not great compared to specialized functions.
What about implementing an iterator? I find it more practical to use when you need to iterate over multiple arrays at the same time.
Something like this:
it:= NewInt32Iterator(comp)
for v, valid := it.Next(); valid; v, valid = it.Next() {
// do whatever with v...
}
A reverse iterator would also be useful, but not sure if it's possible to implement efficiently.
Arbitrary reslicing, is not feasible since encoding is performed by blocks of 128 or 256 values.
Having way for stream-style iteration - useful feature (and for me also).
But I did mean another thing: to do Search/Seek without touching/decompress all previous values. Use case: imagine I do store very large compressed immutable list in MMAP file. As less places in compressed form I will touch - less disk reads (and RAM) will happen. So, in this case - iterator is the worst case (It will brude-force touch all compressed bytes and read them from disk).
In my use-case I have many (>> RAM) such lists (actually it's lists of offsets in some files). FYI: Now we using some made-by-us implementation of EliasFano encoding.
Right now the compression scheme doesn't allow decoding at an arbitrary position in the compressed buffer. So it's not really possible to implement a binary search.
I am considering modifying the compression logic in order to force the creation of new blocks at fixed positions in the compressed buffer. This has a small cost as a few bytes may be wasted at the end of the blocks, but it will allow decompression at arbitrary position.
I have updated the README with next steps in this direction.