flatbuffers
flatbuffers copied to clipboard
[swift] support for unaligned buffers
Usually when reading data from network or from binary file there is no guarantee that data will be placed in memory with respect to alignment. So, trying to use such memory buffers as is will lead to precondition.
In swift 5.7 there is a new loadUnaligned function is available, so plain data types can be accessed without respect to alignment. So, seems that means by using loadUnaligned
we can create flatbuffer from any buffer and it will work normally.
So, I suggest to start using loadUnaligned
instead of load
. storeBytes
works without changing function signature. Do i miss anything?
So flatbuffers usually do align the buffers according to what they will be used for. For example here, where we Pre-align the buffer to write the data properly. So in terms of flatbuffers being safe to be written in swift they are.
@inline(__always)
@discardableResult
mutating public func push<T: Scalar>(element: T) -> UOffset {
let size = MemoryLayout<T>.size
preAlign(
len: size,
alignment: size)
_bb.push(value: element, len: size)
return _bb.size
}
Regarding the reading, we should always verify the buffer especially if they are coming from a network. which uses the following function:
public func getCheckedRoot<T: FlatBufferObject & Verifiable>(
byteBuffer: inout ByteBuffer,
fileId: String? = nil,
options: VerifierOptions = .init()) throws -> T
{
var verifier = try Verifier(buffer: &byteBuffer, options: options)
if let fileId = fileId {
try verifier.verify(id: fileId)
}
try ForwardOffset<T>.verify(&verifier, at: 0, of: T.self)
return T.init(
byteBuffer,
o: Int32(byteBuffer.read(def: UOffset.self, position: byteBuffer.reader)) +
Int32(byteBuffer.reader))
}
This will always check for alignment, and content according to what the code is supposed to actual read. By default we always check for alignment, but if you want to skip that, you can always re-write the function yourself.
@dbaileychess please do correct me if i'm wrong regarding writing into a flatbuffer and that the values are always aligned
By default we always check for alignment, but if you want to skip that, you can always re-write the function yourself.
I'm not sure I understand what do you mean by "re-write the function yourself". It's flatbuffers API function, isn't it?
Maybe I should be a bit more precise in use case I want to fix. For example, I have something like:
func handleMessage(message: UnsafeMutableRawPointer, capacity: Int) {
// handle function called with pointer to some buffer allocated and read from network somewhere.
// I don't what to copy it, so I want to create flat buffer message on top of it
let byteBuffer = ByteBuffer(assumingMemoryBound: message, capacity: capacity)
// Now byte buffer keep unowned reference to memory buffer, let's try to create flatbuffer
let myMessage = MyMessage.getRootAsMyMessage(bb: byteBuffer)
// And I get precondition here!
I don't see much difference vs getCheckedRoot
since any read, for example, byteBuffer.read(def: UOffset.self, position: byteBuffer.reader)
can cause the same problem since it ends up with loading UOffset
from memory.
@mr-swifter
I'm not sure I understand what do you mean by "re-write the function yourself". It's flatbuffers API function, isn't it?
Yes it is, but it's written that way so that users can copy and paste the code and modify it as they please. for example, the verifier struct can also be created manually and from there we can set the alignment checking to false.
Is the message aka UnsafeMutableRawPointer
a valid single object flatbuffer?
Because I do understand, the fact that there is an issue reading the buffer, but if the buffer wasn't aligned properly first then the issue might be in creating the buffer, or the buffer isn't complete. and honestly i would rather have the precondition fail safe rather than just allowing unaligned reading from memory.
But let's see what @dbaileychess would say regarding the alignment. If you have example of the buffer that would be also great!
I don't see much difference vs getCheckedRoot since any read, for example, byteBuffer.read(def: UOffset.self, position: byteBuffer.reader) can cause the same problem since it ends up with loading UOffset from memory.
Yes it does read the Uoffset but after it validates if its a proper Uoffset that will not break the app. If you look at this you will see we are validating the UOffset first, and validating that the UOffset is within the buffer
try ForwardOffset<T>.verify(&verifier, at: 0, of: T.self)
@mustiikhalil the flatbuffer message will be appropriately aligned internally (and the message is valid), but when reading multiple messages streaming from e.g. a network buffer the starting position for a given message is not necessarily aligned appropriately when they are sequentially laid out in memory - using the loadUnaligned function as suggested allows for direct access of the memory as a flatbuffers message without requiring making a copy (which is one of the main features of flatbuffers!) and is a re ent addition to Swift (5.7 iirc). The same may be an issue if mmaping a file containing flatbuffers messages sequentially laid out. Using load unaligned allows these use cases to work as intended too….
Yes, exactly, thanks @hassila !
I think the test I've added visualise the issue:
func testUnalignedRead() {
...
let fbb = createMonster(withPrefix: false)
...
// Unaligned read
let testUnaligned: () -> Bool = {
var bytes: [UInt8] = [0x00]
bytes.append(contentsOf: fbb.sizedByteArray)
return bytes.withUnsafeMutableBytes { ptr in
guard var baseAddress = ptr.baseAddress else {
XCTFail("Base pointer is not defined")
return false
}
baseAddress = baseAddress.advanced(by: 1)
let unlignedPtr = UnsafeMutableRawPointer(baseAddress)
var monster = Monster.getRootAsMonster(bb:
ByteBuffer(assumingMemoryBound: unlignedPtr, capacity: ptr.count - 1))
self.readFlatbufferMonster(monster: &monster)
return true
}
}
...
Here I emulate this, by creating buffer, move it by one byte (like I got from network buffer with such offset) and try to access this correct flatbuffer, but get precondition on first load of UInt32, since starting point is not aligned to uint32 memory layout.
Makes sense! I'll review the PR, but I don't think we will need it in the builder, if that makes sense. Since we always check for alignments when we are building the buffer.
FlatBuffers internally aligns the data, and assumes the buffer itself is placed on an aligned boundary. So depending on how the network buffer lays out stuff, it may start on an unaligned boundary. However, that's outside the control of flatbuffers itself, so I would prefer the solution to be outside flatbuffers as well.
I don't know swift, but specifying the buffer from memory should occur outside of flatbuffers and fed into the getRootAs<type>()
method. So do any sort of loadUnAlign call there? Correct me if I am wrong on how the swift API works.
Basically swift support accessing the underlying data unaligned (see the very minimal patch using the new swift 5.7 feature for this https://github.com/google/flatbuffers/pull/7641/files) - when using this the swift implementation also handles when the buffer is placed on an unaligned boundary - this is really desirable for any high performance streaming/serialized data as requiring to place the buffer on an aligned boundary forces a copy in such cases, which is unfortunate given flatbuffers fundamental support for zero-copy. Supporting this is basically free and older swift versions would work just as today (i.e. don’t support unaligned access).
I've removed loadUnaligned
from builder.
Strictly speaking [loadUnaligned](https://github.com/apple/swift-evolution/blob/main/proposals/0349-unaligned-loads-and-stores.md)
is slower then load as stated in the linked document:
We believe that Swift's goals align well with having the more performant function (aligned load) be the default one.
FlatBuffers first principal is being fast. Bit-packing is not part of FlatBuffers goals. So when multiple buffers are concatenated in one stream, or file, they should be padded. This is IMHO what a high performance application should do if it chose to use FlatBuffers in the first place. Switching to loadUnaligned
in func read<T>(def: T.Type, position: Int) -> T
will tax every user of FlatBuffers for no benefit.
To stress why loadUnaligned
is slower than load
. It internally does a memory copy, which would be propagated to every scalar type read if implemented in func read<T>(def: T.Type, position: Int) -> T
https://github.com/apple/swift/blob/958832dc2b538dce8c9da26de22c6cfa78760b4e/stdlib/public/core/UnsafeRawPointer.swift#L1250
To stress why
loadUnaligned
is slower thanload
. It internally does a memory copy, which would be propagated to every scalar type read if implemented infunc read<T>(def: T.Type, position: Int) -> T
https://github.com/apple/swift/blob/958832dc2b538dce8c9da26de22c6cfa78760b4e/stdlib/public/core/UnsafeRawPointer.swift#L1250
One fix could simply be to check if the base pointer is aligned and then use aligned load, otherwise fall back on unaligned - then it's a single branch in cost. Padding on the stream is not always possible and is quite limiting - e.g. when storing things in a KV database / on the filesystem, even if you pad there's not guarantees that the data will be laid out concatenated and it would be great to be able to do zero copy in those instances.
Agree that forcing a memcpy always is bad, but perhaps an alignment check and picking the required function would be ok?
but perhaps an alignment check and picking the required function would be ok?
We do have a verifier that makes sure that the buffer is align and throws if the buffer isnt aligned in swift.
Check getCheckedRoot
@mzaks, it's very good point regarding loadUnaligned
being slower than load
! Thanks! Let me try to fallback to loadUnaligned
only when required.
I would say in general padding is not universal solution, especially if memory management is done by 3rd party library. For example, I use librdkafka library to communicate with Kafka cluster and have no control how and where my messages in flatbuffers will be placed in memory by the library, but still want to avoid copying where reading this data.
Hm, I see the problem. But then, isn't it cheaper to copy a full buffer once (given it is not tremendously large). Then copy each scalar value (death by a thousand cuts)? Say we are very unlucky and the buffer starts at byte 7. Every seconds 4 byte value is misaligned as FlatBuffers does the alignment and padding. And FlatBuffers has lots of 4 byte values :) as every offset and length is a 4 byte value.
Fair point, but It may still be worth it for many use cases (we have fairly big flatbuffers at times!) - but also you can avoid a malloc if you want to pass around a reference - some times we just want to read a few fields and the write the flatbuffers to a different destination (eg. Read from Kafka, read a few fields, write to local database or to socket)
@mr-swifter
Sorry i've been pretty busy lately, but as a solution would this work: (Not sure of the syntax), but we can figure it out within the PR
struct ByteBuffer {
public var isUnsafeMemoryReadingEnabled = false
func read() {
return ...
}
@available(5.7)
func read() {
if isUnsafeMemoryReadingEnabled {
return ...
} else {
return ...
}
}
}
@mzaks what do you think of the proposed solution, since we would love to keep the API as is, but at the same time, it would give users a bit more freedom to read directly if they dont want to align the memory.
I think it’s a good alternative. Do we actually have performance tests, which we can run to make sure we don’t degrade?
Actually I think proper conditional compilation would be better. The use case for unsafe reading can be defined as environment variable.
Yeah that would also be even better! But would require changes within the package.swift
which is completely fine.
It would also be nice to run the tests twice to verify that both, normal and unsafe reading is working as expected (on the CI).
func read() -> T {
#if UNSAFE_MEMORY_READING_ENABLED
return ...
#else
return ...
#endif
}
Hm, since we know that flatbuffer fields are always properly aligned in respect of each other, i guess then we can understand what function (load or loadUnaligend) should be always used during ByteBuffer init. In this sense I prefer the way what @mustiikhalil , since it can take advantage of memory aligned if possible.
Regarding performance impact I can do benchmark test and if the difference is not significant should we stick to the adaptive approach?
If there is a difference we can go with environment variable way
What do you think @mzaks , @mustiikhalil ?
I am considered with having a condition in a read method, which is probably the hottest method we have. This is why I suggested to have a conditional compilation. That sad my concerns regarding possible branch mis predictions might be unfounded, this is why it would be great if @mr-swifter you could provide a good performance test which can disprove my assumptions.
I created a repo 6 years ago, which I used to benchmark my own implementation of FlatBuffers, maybe it can be adopted https://github.com/mzaks/FlatBuffersSwiftPerformanceTest/tree/master/FBTest
Hi! As far as i see the main concern regarding the PR is a possible performance degradation. I did some investigations without performance test, just checking the assembler code emitted by the swift compiler Here is a swift code:
func test_func() -> Void {
let ptr = UnsafeMutableRawPointer.allocate(byteCount: 20, alignment:0)
defer { ptr.deallocate() }
for i in 0..<20 { ptr.storeBytes(of: UInt8(i), toByteOffset: i, as: UInt8.self) }
let v = ptr.loadUnaligned(fromByteOffset: 1, as: UInt32.self)
print("v=\(v)")
}
And here is an assembler code, built with optimisation (with swift 5.7.2)
0x100003dc8 <+36>: ldr q0, 0x100003f70
0x100003dcc <+40>: str q0, [x0]
0x100003dd0 <+44>: mov w8, #0x1110
0x100003dd4 <+48>: movk w8, #0x1312, lsl #16
0x100003dd8 <+52>: str w8, [x0, #0x10]
0x100003ddc <+56>: ldur w20, [x0, #0x1]
first 5 lines are actually a loop filling the buffer and the last line is a 'loadUnaligned' changing 'loadUnaligned' to just 'load' does not affect the assembler code in that particular case, it is always the same.
It seems to me the performance degradation in case of using 'loadUnaligned' instead of 'load' with release build is very unlikely. Probably the debug build will be a little bit slower, but does it matter?
Checking alignment for each read would be mess for the brunch predictor.
Seems to me the PR is good...
@ser-0xff i would suggest also running the benchmark on both master
and change branch
and just verifying that it would run fast when running on release and no performance degradation is happening. Also running the fuzzer would be really appreciated. because as mentioned above, i would say using a flag would be the best approach here. Since not everyone would want to run the changes
@ser-0xff could you please provide the way you generate the assembler code?
The problem with your example is that the compiler can inline all the copying and boundary checks as your example is static. You are probably compiled for an 64bit arch, the initial pointer is aligned, you read 4 bytes with an offset of 1 byte, so you are not crossing any boundaries.
Following test would be much more interesting:
func test_func2() -> Void {
let ptr = UnsafeMutableRawPointer.allocate(byteCount: 200, alignment:0)
defer { ptr.deallocate() }
for i in 0..<200 { ptr.storeBytes(of: UInt8(i), toByteOffset: i, as: UInt8.self) }
for i in 0..<50 {
let v = ptr.loadUnaligned(fromByteOffset: i, as: UInt32.self)
print("v=\(v)")
}
}
As I guess it can't just unroll and inline calls to loadUnaligned
.
That sad. The docs for ARM64 LDUR says that the offset can be in the range -256 to 255, so maybe it actual is able to read misaligned data without any issues/overhead, but this is architecture specific behaviour, what about Intel, or a different (old, or more exotic e.g. watchOS) ARM architecture?
load_test % cat test2.swift
import Foundation
func test_func2() -> Void {
let ptr = UnsafeMutableRawPointer.allocate(byteCount: 200, alignment:0)
defer { ptr.deallocate() }
for i in 0..<200 { ptr.storeBytes(of: UInt8(i), toByteOffset: i, as: UInt8.self) }
for i in 0..<50 {
let v = ptr.loadUnaligned(fromByteOffset: i, as: UInt32.self)
print("v=\(v)")
}
}
test_func2()
load_test % swiftc -O test2.swift
Looked at assembler using both lldb and objdump. The loop with 'loadUnaligned' looks like
0x100003d14 <+244>: add x28, x8, #0x1
0x100003d18 <+248>: ldr w20, [x19, x8]
0x100003d1c <+252>: mov x0, x21
0x100003d20 <+256>: mov w1, #0x40
0x100003d24 <+260>: mov w2, #0x7
0x100003d28 <+264>: bl 0x100003e6c ; symbol stub for: swift_allocObject
0x100003d2c <+268>: mov x23, x0
0x100003d30 <+272>: ldr q0, [sp]
0x100003d34 <+276>: str q0, [x0, #0x10]
0x100003d38 <+280>: stp x25, x26, [sp, #0x20]
0x100003d3c <+284>: str w20, [sp, #0x18]
0x100003d40 <+288>: bl 0x100003e08 ; lazy protocol witness table accessor for type Swift.UInt32 and conformance Swift.UInt32 : Swift.BinaryInteger in Swift
0x100003d44 <+292>: mov x1, x0
0x100003d48 <+296>: add x20, sp, #0x18
0x100003d4c <+300>: mov x0, x22
0x100003d50 <+304>: bl 0x100003e54 ; symbol stub for: Swift.BinaryInteger.description.getter : Swift.String
0x100003d54 <+308>: mov x24, x1
0x100003d58 <+312>: add x20, sp, #0x20
0x100003d5c <+316>: bl 0x100003e48 ; symbol stub for: Swift.String.append(Swift.String) -> ()
0x100003d60 <+320>: mov x0, x24
0x100003d64 <+324>: bl 0x100003e78 ; symbol stub for: swift_bridgeObjectRelease
0x100003d68 <+328>: ldp x8, x9, [sp, #0x20]
0x100003d6c <+332>: str x27, [x23, #0x38]
0x100003d70 <+336>: stp x8, x9, [x23, #0x20]
0x100003d74 <+340>: mov x0, x23
0x100003d78 <+344>: mov w1, #0x20
0x100003d7c <+348>: mov x2, #-0x1f00000000000000
0x100003d80 <+352>: mov w3, #0xa
0x100003d84 <+356>: mov x4, #-0x1f00000000000000
0x100003d88 <+360>: bl 0x100003e60 ; symbol stub for: Swift.print(_: Any..., separator: Swift.String, terminator: Swift.String) -> ()
0x100003d8c <+364>: mov x0, x23
0x100003d90 <+368>: bl 0x100003e9c ; symbol stub for: swift_release
0x100003d94 <+372>: mov x8, x28
0x100003d98 <+376>: cmp x28, #0x32
0x100003d9c <+380>: b.ne 0x100003d14 ; <+244>
value is being load into register w20 at second line, remaining code is string formatting. Again, no any checks, just a single instruction...
And regarding other architectures... The question is is there any checks supposed to be done by 'loadUnaligned' or not. I did not found anything in the swift standard library documentation. If not - then I think 'loadUnaligned' will work in the same way as ReadScalar() in C++ implementation... https://github.com/google/flatbuffers/blob/master/include/flatbuffers/base.h#L418
There are no any alignment checks and as far as i understand it works fine... I guess set of architectures supported by C++ port is wider than supported by swift...
This issue is stale because it has been open 6 months with no activity. Please comment or label not-stale
, or this will be closed in 14 days.