redb icon indicating copy to clipboard operation
redb copied to clipboard

Will memory alignment be supported?

Open Kerollmops opened this issue 1 year ago • 7 comments

Hey,

I was looking at redb and thought about one feature that could be very interesting and that is supported only by sanakirja right now. Memory alignment could bexcitingng when it comes to speed, for example, avoid having to copy memory could bring more performance.

Kerollmops avatar Sep 11 '22 22:09 Kerollmops

Very unlikely, because that could cause portability issues: type alignment is architecture dependent. I would also be surprised if the performance difference was significant. In most cases you can use functions like u64::from_le_bytes without penalty

cberner avatar Sep 11 '22 23:09 cberner

I was thinking more about a bigger data-structure like roaring bitmaps. In C they developed a "frozen format" that can be directly reinterpreted and is much faster than allocating and copying entire containers.

Kerollmops avatar Sep 12 '22 11:09 Kerollmops

I see. Supporting external libraries that require alignment isn't one that I had considered. Has anyone tried porting roaring bitmaps to Rust and removing the alignment requirements? All the benchmarks I've seen show that there's basically no different between aligned and unaligned access on modern CPUs.

cberner avatar Sep 13 '22 15:09 cberner

Thank you for reopening this issue.

Has anyone tried porting roaring bitmaps to Rust and removing the alignment requirements?

There indeed is a pure Rust port of the RoaringBitmap library that I maintain. There is no alignment required for the default de/serialization format, but it brings a lot of performance impact when you need to deserialize it in memory to possibly just do a big union of them (you could keep them frozen) and generate a new in-memory-mutable bitmap.

I was thinking about using the power of memory mapping to keep things where they are and avoid duplicating them elsewhere to be able to read them when this would be unnecessary if the memory were correctly aligned.

We talked a little bit about this format with Daniel Lemire in this issue. It can be hard to read, so I advise you only to read the last message:

In C, we use a distinct format (which we refer to as frozen) which provides the necessary alignment assuming that the starting pointer is itself 32-bit aligned.

Kerollmops avatar Sep 13 '22 16:09 Kerollmops

Ah yes, I can see why that's expensive. Is it possible to instead borrow the &[u8] that represents the roaring bitmap data, and read the byte as u32, u64...etc as needed, rather than deserializing it into a RoaringBitmap object?

I took that approach with my GroupedBitmap struct that is logically an &[u64], but instead it performs accesses into the &[u8] using from_le_bytes() to avoid having alignment requirements.

cberner avatar Sep 14 '22 03:09 cberner

Is it possible to instead borrow the &[u8] that represents the roaring bitmap data, and read the byte as u32, u64...etc as needed, rather than deserializing it into a RoaringBitmap object?

It seems like a good idea, but I am not sure it will be easy to implement instead of just porting the frozen format C implementation to Rust. However, performance cost can be significant on some arch like the ARMv7 (Apple Mac M1 is an ARMv8) where an unaligned instruction generates four instructions instead of two when the bytes are unaligned.

I am just thinking about the implementation detail on your side. Wouldn't it be some padding bytes in front of the value to make it correctly aligned on the page? I understand that you must keep the length of the padding to be able to skip it when reading 🤔

Kerollmops avatar Sep 14 '22 08:09 Kerollmops

However, performance cost can be significant on some arch like the ARMv7 (Apple Mac M1 is an ARMv8) where an unaligned instruction generates four instructions instead of two when the bytes are unaligned.

Well, additional instructions doesn't necessarily translate into a difference in performance. Except for very tight loops, instructions should be practically free compared with the loads from memory/disk that redb has to do.

Oh yes, the implementation would be pretty straightforward. The thing that I'm worried about is portability between architectures. The alignment of the primitive types is not specified in Rust, and is platform dependent. I have not review every architecture that's supported to see if they have the same alignment, but I could imagine a 64bit platform on which say u32 has 8 byte alignment

cberner avatar Sep 14 '22 15:09 cberner