redb icon indicating copy to clipboard operation
redb copied to clipboard

Will memory alignment be supported?

Open Kerollmops opened this issue 3 years ago • 7 comments
trafficstars

Hey,

I was looking at redb and thought about one feature that could be very interesting and that is supported only by sanakirja right now. Memory alignment could bexcitingng when it comes to speed, for example, avoid having to copy memory could bring more performance.

Kerollmops avatar Sep 11 '22 22:09 Kerollmops

Very unlikely, because that could cause portability issues: type alignment is architecture dependent. I would also be surprised if the performance difference was significant. In most cases you can use functions like u64::from_le_bytes without penalty

cberner avatar Sep 11 '22 23:09 cberner

I was thinking more about a bigger data-structure like roaring bitmaps. In C they developed a "frozen format" that can be directly reinterpreted and is much faster than allocating and copying entire containers.

Kerollmops avatar Sep 12 '22 11:09 Kerollmops

I see. Supporting external libraries that require alignment isn't one that I had considered. Has anyone tried porting roaring bitmaps to Rust and removing the alignment requirements? All the benchmarks I've seen show that there's basically no different between aligned and unaligned access on modern CPUs.

cberner avatar Sep 13 '22 15:09 cberner

Thank you for reopening this issue.

Has anyone tried porting roaring bitmaps to Rust and removing the alignment requirements?

There indeed is a pure Rust port of the RoaringBitmap library that I maintain. There is no alignment required for the default de/serialization format, but it brings a lot of performance impact when you need to deserialize it in memory to possibly just do a big union of them (you could keep them frozen) and generate a new in-memory-mutable bitmap.

I was thinking about using the power of memory mapping to keep things where they are and avoid duplicating them elsewhere to be able to read them when this would be unnecessary if the memory were correctly aligned.

We talked a little bit about this format with Daniel Lemire in this issue. It can be hard to read, so I advise you only to read the last message:

In C, we use a distinct format (which we refer to as frozen) which provides the necessary alignment assuming that the starting pointer is itself 32-bit aligned.

Kerollmops avatar Sep 13 '22 16:09 Kerollmops

Ah yes, I can see why that's expensive. Is it possible to instead borrow the &[u8] that represents the roaring bitmap data, and read the byte as u32, u64...etc as needed, rather than deserializing it into a RoaringBitmap object?

I took that approach with my GroupedBitmap struct that is logically an &[u64], but instead it performs accesses into the &[u8] using from_le_bytes() to avoid having alignment requirements.

cberner avatar Sep 14 '22 03:09 cberner

Is it possible to instead borrow the &[u8] that represents the roaring bitmap data, and read the byte as u32, u64...etc as needed, rather than deserializing it into a RoaringBitmap object?

It seems like a good idea, but I am not sure it will be easy to implement instead of just porting the frozen format C implementation to Rust. However, performance cost can be significant on some arch like the ARMv7 (Apple Mac M1 is an ARMv8) where an unaligned instruction generates four instructions instead of two when the bytes are unaligned.

I am just thinking about the implementation detail on your side. Wouldn't it be some padding bytes in front of the value to make it correctly aligned on the page? I understand that you must keep the length of the padding to be able to skip it when reading 🤔

Kerollmops avatar Sep 14 '22 08:09 Kerollmops

However, performance cost can be significant on some arch like the ARMv7 (Apple Mac M1 is an ARMv8) where an unaligned instruction generates four instructions instead of two when the bytes are unaligned.

Well, additional instructions doesn't necessarily translate into a difference in performance. Except for very tight loops, instructions should be practically free compared with the loads from memory/disk that redb has to do.

Oh yes, the implementation would be pretty straightforward. The thing that I'm worried about is portability between architectures. The alignment of the primitive types is not specified in Rust, and is platform dependent. I have not review every architecture that's supported to see if they have the same alignment, but I could imagine a 64bit platform on which say u32 has 8 byte alignment

cberner avatar Sep 14 '22 15:09 cberner

For using rkyv in redb, memory alignment would also be very interesting. I'd think it's sufficiently useful to make it available even if portability can't be ensured when using it.

ousado avatar Nov 26 '22 17:11 ousado

Fyi trying to actually use rkyv with redb right now, and running into alignment issues with data returned from redb, so I would very much appreciate anything in this direction.

dignifiedquire avatar Jan 02 '23 23:01 dignifiedquire

I was thinking that the simplest way to support alignment may be:

  • Only support aligning byte slices. Push doing deserialization from aligned byte slices to the user.
  • Only support aligning whole redb keys and redb values. (Not elements of tuples.)
  • Require that the user ask for the alignment they want, so that padding bytes aren't used when alignment isn't required, and so redb doesn't need to worry about different alignment on different architectures, it's up to the user to ask for the alignment they need.

In practice, this might look like a type like:

struct Aligned<const N: usize>;

Which can be used as a redb key or value, and whose value is a byte slice for which slice.as_ptr() % N == 0 is guaranteed when read from the database.

casey avatar Jan 12 '23 06:01 casey

@casey that sounds very reasonable, and would make it straightforward to integrate with tools like rkyv and zerocopy to allow for custom zero copy types.

dignifiedquire avatar Jan 12 '23 11:01 dignifiedquire

@casey yep, that's how I'm thinking of implementing it to start. I'm going to try and make it flexible enough that if I end up making RedbKey & RedbValue public in the future it will be possible to specify alignment for custom types, and I'd also like it to be possible to implement aligned nested types inside tuples (not going to support that to start with though, since it'd be pretty annoying to implement)

cberner avatar Jan 13 '23 16:01 cberner

I think this is probably blocked on https://github.com/rust-lang/rust/issues/76560

To ensure that the alignment is checked at compile time, I need code like this to work: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=bb5f314b6aa0e29f26566fb7d715d7eb and I don't see a way to implement that on stable. If anyone has ideas lemme know!

cberner avatar Jan 15 '23 00:01 cberner

https://github.com/cberner/redb/pull/490 adds alignment to the file format, so that it can be supported in the future without requiring a file format upgrade

cberner avatar Jan 15 '23 01:01 cberner

I did some work on this in the alignment branch. The next step from that branch is changing RedbValue::from_bytes() to take an AlignedSlice<Self::ALIGNMENT>

However, I'm not sure this is going to be possible in the near future with the Rust type system. Basically, I need code like this to work: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=00c17b86bc4d44cec05ffab93da10cc0 and it looks like that's been ruled out for the moment due to it not being const well-formed. I don't entirely understand const wf, but it looks like https://github.com/rust-lang/rust/issues/76560 won't be sufficient to make that branch work.

Anyone have ideas for making this work?

cberner avatar Feb 08 '23 02:02 cberner

If you make the size a generic parameter it could work, not sure this fits in the rest of your design: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a239c92b3f0225c0d472bf635deb32e7

dignifiedquire avatar Feb 08 '23 16:02 dignifiedquire

ya, I looked into that solution briefly, and good to see that it seems to work. However, I think it's probably not going to be a good fit for the rest of the redb API, because then those alignment generic parameters are going to pollute the whole API since nearly everything references RedbValue and would now have twice as many parameters. I'll play with it a bit more though.

cberner avatar Feb 09 '23 05:02 cberner

@dignifiedquire I tried that out, and unfortunately it's not going to work well. The const generic parameter would leak out into the whole API, requiring things like const MY_TABLE: TableDefinition<&str, 1, &str, 1> ... instead of TableDefinition<&str, &str>.

I think the best workaround is going to be to store the values that require alignment in a separate structure, such as a separate mmap'ed file, and then store the offset & length in redb to allow retrieving the value.

Or if you know anyone who works on the const generic type system, maybe you can pitch them on adding support for this case :)

cberner avatar Feb 10 '23 23:02 cberner

I added an example showing one way to workaround this. Please re-open if you find a way around the Rust associated const generic issue though! I do think this would be a nice feature to have in the core API, but don't want to add it if it requires adding const generic parameters to everything.

cberner avatar Feb 11 '23 00:02 cberner

Would it be possible to move alignment concerns out of the type system? I haven't studied redb's internals yet, but it seems like adding/skip padding could be dynamic, with redb's API guaranteeing that the argument to RedbValue::from_bytes matches the required alignment, and no other interface changes. Downstream code could use e.g. slice::align_to to leverage this safely. This would confer the practical benefits benefits without placing more demands on the language.

Ralith avatar Mar 05 '23 21:03 Ralith

Ya, that's definitely possible, but I don't want to implement it that way because it would be too easy to introduce bugs

cberner avatar Mar 05 '23 23:03 cberner

Nothing an assert couldn't guard against, surely?

Ralith avatar Mar 06 '23 08:03 Ralith