flatbuffers icon indicating copy to clipboard operation
flatbuffers copied to clipboard

[swift] support for unaligned buffers

Open mr-swifter opened this issue 1 year ago • 18 comments

Usually when reading data from network or from binary file there is no guarantee that data will be placed in memory with respect to alignment. So, trying to use such memory buffers as is will lead to precondition.

In swift 5.7 there is a new loadUnaligned function is available, so plain data types can be accessed without respect to alignment. So, seems that means by using loadUnaligned we can create flatbuffer from any buffer and it will work normally.

So, I suggest to start using loadUnaligned instead of load. storeBytes works without changing function signature. Do i miss anything?

mr-swifter avatar Nov 14 '22 14:11 mr-swifter

So flatbuffers usually do align the buffers according to what they will be used for. For example here, where we Pre-align the buffer to write the data properly. So in terms of flatbuffers being safe to be written in swift they are.

  @inline(__always)
  @discardableResult
  mutating public func push<T: Scalar>(element: T) -> UOffset {
    let size = MemoryLayout<T>.size
    preAlign(
      len: size,
      alignment: size)
    _bb.push(value: element, len: size)
    return _bb.size
  }

Regarding the reading, we should always verify the buffer especially if they are coming from a network. which uses the following function:

public func getCheckedRoot<T: FlatBufferObject & Verifiable>(
  byteBuffer: inout ByteBuffer,
  fileId: String? = nil,
  options: VerifierOptions = .init()) throws -> T
{
  var verifier = try Verifier(buffer: &byteBuffer, options: options)
  if let fileId = fileId {
    try verifier.verify(id: fileId)
  }
  try ForwardOffset<T>.verify(&verifier, at: 0, of: T.self)
  return T.init(
    byteBuffer,
    o: Int32(byteBuffer.read(def: UOffset.self, position: byteBuffer.reader)) +
      Int32(byteBuffer.reader))
}

This will always check for alignment, and content according to what the code is supposed to actual read. By default we always check for alignment, but if you want to skip that, you can always re-write the function yourself.

mustiikhalil avatar Nov 14 '22 14:11 mustiikhalil

@dbaileychess please do correct me if i'm wrong regarding writing into a flatbuffer and that the values are always aligned

mustiikhalil avatar Nov 14 '22 14:11 mustiikhalil

By default we always check for alignment, but if you want to skip that, you can always re-write the function yourself.

I'm not sure I understand what do you mean by "re-write the function yourself". It's flatbuffers API function, isn't it?

Maybe I should be a bit more precise in use case I want to fix. For example, I have something like:

func handleMessage(message: UnsafeMutableRawPointer, capacity: Int) { 
// handle function called with pointer to some buffer allocated and read from network somewhere. 
// I don't what to copy it, so I want to create flat buffer message on top of it
let byteBuffer = ByteBuffer(assumingMemoryBound: message, capacity: capacity)
// Now byte buffer keep unowned reference to memory buffer, let's try to create flatbuffer
let myMessage = MyMessage.getRootAsMyMessage(bb: byteBuffer)
// And I get precondition here! 

I don't see much difference vs getCheckedRoot since any read, for example, byteBuffer.read(def: UOffset.self, position: byteBuffer.reader) can cause the same problem since it ends up with loading UOffset from memory.

mr-swifter avatar Nov 14 '22 15:11 mr-swifter

@mr-swifter

I'm not sure I understand what do you mean by "re-write the function yourself". It's flatbuffers API function, isn't it?

Yes it is, but it's written that way so that users can copy and paste the code and modify it as they please. for example, the verifier struct can also be created manually and from there we can set the alignment checking to false.

Is the message aka UnsafeMutableRawPointer a valid single object flatbuffer?

Because I do understand, the fact that there is an issue reading the buffer, but if the buffer wasn't aligned properly first then the issue might be in creating the buffer, or the buffer isn't complete. and honestly i would rather have the precondition fail safe rather than just allowing unaligned reading from memory.

But let's see what @dbaileychess would say regarding the alignment. If you have example of the buffer that would be also great!

mustiikhalil avatar Nov 14 '22 15:11 mustiikhalil

I don't see much difference vs getCheckedRoot since any read, for example, byteBuffer.read(def: UOffset.self, position: byteBuffer.reader) can cause the same problem since it ends up with loading UOffset from memory.

Yes it does read the Uoffset but after it validates if its a proper Uoffset that will not break the app. If you look at this you will see we are validating the UOffset first, and validating that the UOffset is within the buffer

  try ForwardOffset<T>.verify(&verifier, at: 0, of: T.self)

mustiikhalil avatar Nov 14 '22 16:11 mustiikhalil

@mustiikhalil the flatbuffer message will be appropriately aligned internally (and the message is valid), but when reading multiple messages streaming from e.g. a network buffer the starting position for a given message is not necessarily aligned appropriately when they are sequentially laid out in memory - using the loadUnaligned function as suggested allows for direct access of the memory as a flatbuffers message without requiring making a copy (which is one of the main features of flatbuffers!) and is a re ent addition to Swift (5.7 iirc). The same may be an issue if mmaping a file containing flatbuffers messages sequentially laid out. Using load unaligned allows these use cases to work as intended too….

hassila avatar Nov 14 '22 16:11 hassila

Yes, exactly, thanks @hassila !

I think the test I've added visualise the issue:

func testUnalignedRead() {
...
             let fbb = createMonster(withPrefix: false)
...
             // Unaligned read
             let testUnaligned: () -> Bool = {
                 var bytes: [UInt8] = [0x00]
                 bytes.append(contentsOf: fbb.sizedByteArray)
                 return bytes.withUnsafeMutableBytes { ptr in
                     guard var baseAddress = ptr.baseAddress else {
                         XCTFail("Base pointer is not defined")
                         return false
                     }
                     baseAddress = baseAddress.advanced(by: 1)
                     let unlignedPtr = UnsafeMutableRawPointer(baseAddress)
                     var monster = Monster.getRootAsMonster(bb:
                         ByteBuffer(assumingMemoryBound: unlignedPtr, capacity: ptr.count - 1))

                     self.readFlatbufferMonster(monster: &monster)
                     return true
                 }
             }
...

Here I emulate this, by creating buffer, move it by one byte (like I got from network buffer with such offset) and try to access this correct flatbuffer, but get precondition on first load of UInt32, since starting point is not aligned to uint32 memory layout.

mr-swifter avatar Nov 14 '22 17:11 mr-swifter

Makes sense! I'll review the PR, but I don't think we will need it in the builder, if that makes sense. Since we always check for alignments when we are building the buffer.

mustiikhalil avatar Nov 14 '22 17:11 mustiikhalil

FlatBuffers internally aligns the data, and assumes the buffer itself is placed on an aligned boundary. So depending on how the network buffer lays out stuff, it may start on an unaligned boundary. However, that's outside the control of flatbuffers itself, so I would prefer the solution to be outside flatbuffers as well.

I don't know swift, but specifying the buffer from memory should occur outside of flatbuffers and fed into the getRootAs<type>() method. So do any sort of loadUnAlign call there? Correct me if I am wrong on how the swift API works.

dbaileychess avatar Nov 14 '22 17:11 dbaileychess

Basically swift support accessing the underlying data unaligned (see the very minimal patch using the new swift 5.7 feature for this https://github.com/google/flatbuffers/pull/7641/files) - when using this the swift implementation also handles when the buffer is placed on an unaligned boundary - this is really desirable for any high performance streaming/serialized data as requiring to place the buffer on an aligned boundary forces a copy in such cases, which is unfortunate given flatbuffers fundamental support for zero-copy. Supporting this is basically free and older swift versions would work just as today (i.e. don’t support unaligned access).

hassila avatar Nov 14 '22 18:11 hassila

I've removed loadUnaligned from builder.

mr-swifter avatar Nov 14 '22 18:11 mr-swifter

Strictly speaking [loadUnaligned](https://github.com/apple/swift-evolution/blob/main/proposals/0349-unaligned-loads-and-stores.md) is slower then load as stated in the linked document:

We believe that Swift's goals align well with having the more performant function (aligned load) be the default one.

FlatBuffers first principal is being fast. Bit-packing is not part of FlatBuffers goals. So when multiple buffers are concatenated in one stream, or file, they should be padded. This is IMHO what a high performance application should do if it chose to use FlatBuffers in the first place. Switching to loadUnaligned in func read<T>(def: T.Type, position: Int) -> T will tax every user of FlatBuffers for no benefit.

mzaks avatar Nov 17 '22 06:11 mzaks

To stress why loadUnaligned is slower than load. It internally does a memory copy, which would be propagated to every scalar type read if implemented in func read<T>(def: T.Type, position: Int) -> T

https://github.com/apple/swift/blob/958832dc2b538dce8c9da26de22c6cfa78760b4e/stdlib/public/core/UnsafeRawPointer.swift#L1250

mzaks avatar Nov 17 '22 06:11 mzaks

To stress why loadUnaligned is slower than load. It internally does a memory copy, which would be propagated to every scalar type read if implemented in func read<T>(def: T.Type, position: Int) -> T

https://github.com/apple/swift/blob/958832dc2b538dce8c9da26de22c6cfa78760b4e/stdlib/public/core/UnsafeRawPointer.swift#L1250

One fix could simply be to check if the base pointer is aligned and then use aligned load, otherwise fall back on unaligned - then it's a single branch in cost. Padding on the stream is not always possible and is quite limiting - e.g. when storing things in a KV database / on the filesystem, even if you pad there's not guarantees that the data will be laid out concatenated and it would be great to be able to do zero copy in those instances.

Agree that forcing a memcpy always is bad, but perhaps an alignment check and picking the required function would be ok?

hassila avatar Nov 17 '22 07:11 hassila

but perhaps an alignment check and picking the required function would be ok?

We do have a verifier that makes sure that the buffer is align and throws if the buffer isnt aligned in swift. Check getCheckedRoot

mustiikhalil avatar Nov 17 '22 13:11 mustiikhalil

@mzaks, it's very good point regarding loadUnaligned being slower than load! Thanks! Let me try to fallback to loadUnaligned only when required.

I would say in general padding is not universal solution, especially if memory management is done by 3rd party library. For example, I use librdkafka library to communicate with Kafka cluster and have no control how and where my messages in flatbuffers will be placed in memory by the library, but still want to avoid copying where reading this data.

mr-swifter avatar Nov 17 '22 14:11 mr-swifter

Hm, I see the problem. But then, isn't it cheaper to copy a full buffer once (given it is not tremendously large). Then copy each scalar value (death by a thousand cuts)? Say we are very unlucky and the buffer starts at byte 7. Every seconds 4 byte value is misaligned as FlatBuffers does the alignment and padding. And FlatBuffers has lots of 4 byte values :) as every offset and length is a 4 byte value.

mzaks avatar Nov 17 '22 14:11 mzaks

Fair point, but It may still be worth it for many use cases (we have fairly big flatbuffers at times!) - but also you can avoid a malloc if you want to pass around a reference - some times we just want to read a few fields and the write the flatbuffers to a different destination (eg. Read from Kafka, read a few fields, write to local database or to socket)

hassila avatar Nov 17 '22 15:11 hassila

@mr-swifter

Sorry i've been pretty busy lately, but as a solution would this work: (Not sure of the syntax), but we can figure it out within the PR

struct ByteBuffer {
  public var isUnsafeMemoryReadingEnabled = false

  func read() {
     return ...
  }
  @available(5.7)
  func read() {
    if isUnsafeMemoryReadingEnabled {
     return ...
    } else {
     return ...
    }
  }
}

mustiikhalil avatar Dec 03 '22 12:12 mustiikhalil

@mzaks what do you think of the proposed solution, since we would love to keep the API as is, but at the same time, it would give users a bit more freedom to read directly if they dont want to align the memory.

mustiikhalil avatar Dec 03 '22 12:12 mustiikhalil

I think it’s a good alternative. Do we actually have performance tests, which we can run to make sure we don’t degrade?

mzaks avatar Dec 03 '22 17:12 mzaks

Actually I think proper conditional compilation would be better. The use case for unsafe reading can be defined as environment variable.

mzaks avatar Dec 03 '22 17:12 mzaks

Yeah that would also be even better! But would require changes within the package.swift which is completely fine.

It would also be nice to run the tests twice to verify that both, normal and unsafe reading is working as expected (on the CI).

func read() -> T {
#if UNSAFE_MEMORY_READING_ENABLED
 return ...
#else
 return ...
#endif
}

mustiikhalil avatar Dec 03 '22 22:12 mustiikhalil

Hm, since we know that flatbuffer fields are always properly aligned in respect of each other, i guess then we can understand what function (load or loadUnaligend) should be always used during ByteBuffer init. In this sense I prefer the way what @mustiikhalil , since it can take advantage of memory aligned if possible.

Regarding performance impact I can do benchmark test and if the difference is not significant should we stick to the adaptive approach?

If there is a difference we can go with environment variable way

What do you think @mzaks , @mustiikhalil ?

mr-swifter avatar Dec 05 '22 10:12 mr-swifter

I am considered with having a condition in a read method, which is probably the hottest method we have. This is why I suggested to have a conditional compilation. That sad my concerns regarding possible branch mis predictions might be unfounded, this is why it would be great if @mr-swifter you could provide a good performance test which can disprove my assumptions.

I created a repo 6 years ago, which I used to benchmark my own implementation of FlatBuffers, maybe it can be adopted https://github.com/mzaks/FlatBuffersSwiftPerformanceTest/tree/master/FBTest

mzaks avatar Dec 05 '22 16:12 mzaks

Hi! As far as i see the main concern regarding the PR is a possible performance degradation. I did some investigations without performance test, just checking the assembler code emitted by the swift compiler Here is a swift code:

func test_func() -> Void {
    let ptr = UnsafeMutableRawPointer.allocate(byteCount: 20, alignment:0)
    defer { ptr.deallocate() }
    for i in 0..<20 { ptr.storeBytes(of: UInt8(i), toByteOffset: i, as: UInt8.self) }
    let v = ptr.loadUnaligned(fromByteOffset: 1, as: UInt32.self)
    print("v=\(v)")
}

And here is an assembler code, built with optimisation (with swift 5.7.2)

    0x100003dc8 <+36>:  ldr    q0, 0x100003f70
    0x100003dcc <+40>:  str    q0, [x0]
    0x100003dd0 <+44>:  mov    w8, #0x1110
    0x100003dd4 <+48>:  movk   w8, #0x1312, lsl #16
    0x100003dd8 <+52>:  str    w8, [x0, #0x10]
    0x100003ddc <+56>:  ldur   w20, [x0, #0x1]

first 5 lines are actually a loop filling the buffer and the last line is a 'loadUnaligned' changing 'loadUnaligned' to just 'load' does not affect the assembler code in that particular case, it is always the same.

It seems to me the performance degradation in case of using 'loadUnaligned' instead of 'load' with release build is very unlikely. Probably the debug build will be a little bit slower, but does it matter?

Checking alignment for each read would be mess for the brunch predictor.

Seems to me the PR is good...

ser-0xff avatar Jan 12 '23 14:01 ser-0xff

@ser-0xff i would suggest also running the benchmark on both master and change branch and just verifying that it would run fast when running on release and no performance degradation is happening. Also running the fuzzer would be really appreciated. because as mentioned above, i would say using a flag would be the best approach here. Since not everyone would want to run the changes

mustiikhalil avatar Jan 12 '23 15:01 mustiikhalil

@ser-0xff could you please provide the way you generate the assembler code?

The problem with your example is that the compiler can inline all the copying and boundary checks as your example is static. You are probably compiled for an 64bit arch, the initial pointer is aligned, you read 4 bytes with an offset of 1 byte, so you are not crossing any boundaries.

Following test would be much more interesting:

func test_func2() -> Void {
    let ptr = UnsafeMutableRawPointer.allocate(byteCount: 200, alignment:0)
    defer { ptr.deallocate() }
    for i in 0..<200 { ptr.storeBytes(of: UInt8(i), toByteOffset: i, as: UInt8.self) }
    for i in 0..<50 {
        let v = ptr.loadUnaligned(fromByteOffset: i, as: UInt32.self)
        print("v=\(v)")
    }
}

As I guess it can't just unroll and inline calls to loadUnaligned.

That sad. The docs for ARM64 LDUR says that the offset can be in the range -256 to 255, so maybe it actual is able to read misaligned data without any issues/overhead, but this is architecture specific behaviour, what about Intel, or a different (old, or more exotic e.g. watchOS) ARM architecture?

mzaks avatar Jan 13 '23 14:01 mzaks

load_test % cat test2.swift 
import Foundation

func test_func2() -> Void {
    let ptr = UnsafeMutableRawPointer.allocate(byteCount: 200, alignment:0)
    defer { ptr.deallocate() }
    for i in 0..<200 { ptr.storeBytes(of: UInt8(i), toByteOffset: i, as: UInt8.self) }
    for i in 0..<50 {
        let v = ptr.loadUnaligned(fromByteOffset: i, as: UInt32.self)
        print("v=\(v)")
    }
}

test_func2()

load_test % swiftc -O test2.swift 

Looked at assembler using both lldb and objdump. The loop with 'loadUnaligned' looks like

    0x100003d14 <+244>: add    x28, x8, #0x1
    0x100003d18 <+248>: ldr    w20, [x19, x8]
    0x100003d1c <+252>: mov    x0, x21
    0x100003d20 <+256>: mov    w1, #0x40
    0x100003d24 <+260>: mov    w2, #0x7
    0x100003d28 <+264>: bl     0x100003e6c               ; symbol stub for: swift_allocObject
    0x100003d2c <+268>: mov    x23, x0
    0x100003d30 <+272>: ldr    q0, [sp]
    0x100003d34 <+276>: str    q0, [x0, #0x10]
    0x100003d38 <+280>: stp    x25, x26, [sp, #0x20]
    0x100003d3c <+284>: str    w20, [sp, #0x18]
    0x100003d40 <+288>: bl     0x100003e08               ; lazy protocol witness table accessor for type Swift.UInt32 and conformance Swift.UInt32 : Swift.BinaryInteger in Swift
    0x100003d44 <+292>: mov    x1, x0
    0x100003d48 <+296>: add    x20, sp, #0x18
    0x100003d4c <+300>: mov    x0, x22
    0x100003d50 <+304>: bl     0x100003e54               ; symbol stub for: Swift.BinaryInteger.description.getter : Swift.String
    0x100003d54 <+308>: mov    x24, x1
    0x100003d58 <+312>: add    x20, sp, #0x20
    0x100003d5c <+316>: bl     0x100003e48               ; symbol stub for: Swift.String.append(Swift.String) -> ()
    0x100003d60 <+320>: mov    x0, x24
    0x100003d64 <+324>: bl     0x100003e78               ; symbol stub for: swift_bridgeObjectRelease
    0x100003d68 <+328>: ldp    x8, x9, [sp, #0x20]
    0x100003d6c <+332>: str    x27, [x23, #0x38]
    0x100003d70 <+336>: stp    x8, x9, [x23, #0x20]
    0x100003d74 <+340>: mov    x0, x23
    0x100003d78 <+344>: mov    w1, #0x20
    0x100003d7c <+348>: mov    x2, #-0x1f00000000000000
    0x100003d80 <+352>: mov    w3, #0xa
    0x100003d84 <+356>: mov    x4, #-0x1f00000000000000
    0x100003d88 <+360>: bl     0x100003e60               ; symbol stub for: Swift.print(_: Any..., separator: Swift.String, terminator: Swift.String) -> ()
    0x100003d8c <+364>: mov    x0, x23
    0x100003d90 <+368>: bl     0x100003e9c               ; symbol stub for: swift_release
    0x100003d94 <+372>: mov    x8, x28
    0x100003d98 <+376>: cmp    x28, #0x32
    0x100003d9c <+380>: b.ne   0x100003d14               ; <+244>

value is being load into register w20 at second line, remaining code is string formatting. Again, no any checks, just a single instruction...

And regarding other architectures... The question is is there any checks supposed to be done by 'loadUnaligned' or not. I did not found anything in the swift standard library documentation. If not - then I think 'loadUnaligned' will work in the same way as ReadScalar() in C++ implementation... https://github.com/google/flatbuffers/blob/master/include/flatbuffers/base.h#L418

There are no any alignment checks and as far as i understand it works fine... I guess set of architectures supported by C++ port is wider than supported by swift...

ser-0xff avatar Jan 13 '23 15:01 ser-0xff

This issue is stale because it has been open 6 months with no activity. Please comment or label not-stale, or this will be closed in 14 days.

github-actions[bot] avatar Jul 14 '23 20:07 github-actions[bot]