compact_vector icon indicating copy to clipboard operation
compact_vector copied to clipboard

How about mmapping these bad boys?

Open rob-p opened this issue 6 years ago • 6 comments

Hi again, @gmarcais!

Another random question / feature request. Imagine that I want to use a compact_vector to store a very large array of encoded integers (e.g. a large suffix array or such). Now, I'm going to compute this array at great cost once, and then use it many times. If the vector is sufficiently large, one spends a lot of time deserializing it into RAM. However, since the layout is so nice, it might make sense to just mmap it so that we can start using it immediately. What do you think it the best way to do this with compact_vector?

rob-p avatar Feb 18 '19 18:02 rob-p

My first thought is: can an Allocator class be used that is backed up by a mmap file? All the constructor do (or should) take an allocator object as their last argument.

Is it enough? Is there a need for explicit support in compact_vector?

gmarcais avatar Feb 18 '19 18:02 gmarcais

The only issue I envision is that one may want to use an mmap allocator sometimes but not always (e.g. contingent on an input argument). I think the allocator approach could work as long as compact_vector is polymorphic allocator aware (https://en.cppreference.com/w/cpp/memory/polymorphic_allocator). Basically, one would want the allocator type to not modify the overall type of the compact_vector.

rob-p avatar Feb 18 '19 19:02 rob-p

The allocator is a template parameter. Isn't it sufficient to do something like:

template<typename IDX, unsigned BITS = 0, typename W = uint64_t>
using ts_vector = compact_vector::ts_vector<IDX, BITS, W, std::pmr::polymorphic_allocator<W>>;

Now you can use ts_vector with polymorphic allocators to your heart delight, choosing at runtime which allocator to use?

I'll admit I have not used polymorphic allocators yet. I would be curious how easy or difficult to write such an allocator.

gmarcais avatar Feb 19 '19 04:02 gmarcais

Having a way to serialize the data structures (just get a few pointers with offsets a few widths) and a constructor that takes the same data structures would be already of great value for our research database.

By skipping over the code, I haven't seen anything that would rule such methods out.

Bouncner avatar Feb 18 '23 13:02 Bouncner

There are really two classes. A compact iterator that does all the actual work. All it cares about is having a based pointer and the length of elements in bits. It really does not care where the memory comes from (allocated by the vector class or directly with malloc or mmap). The vector class does very little: it allocates memory using the allocator template parameter, and otherwise delegates everything to the iterator.

There are a few non-standard calls: get() returns the based point of the vector, bytes() returns the number of bytes in the vector (up to capacity, not just length), bits() the number of bits per elements. So everything is there to serialize the data structure.

The vector class can't be recreated directly from these elements unfortunately, although it could be done with an allocator. And probably it is best if done within an allocator as otherwise methods that resize the vector may use the wrong type of allocator to manage the pointer.

gmarcais avatar Feb 20 '23 15:02 gmarcais

Argh, yes. We are are using the vectors read-only. Never thought about growing.

Bouncner avatar Feb 20 '23 19:02 Bouncner