FlatBuffersSwift icon indicating copy to clipboard operation
FlatBuffersSwift copied to clipboard

Thoughts on breaking usage convention for `Data.withUnsafeBytes`?

Open anohren opened this issue 7 years ago β€’ 8 comments

It seems to me that this initializer is not following the documentation of withUnsafeBytes which says not to use the pointer outside the passed closure. What's the reason for writing the init like this? Of course, in absence of a comment, I don't know of its specific use case.

AFAIK Data doesn't guarantee using contiguous memory. What are the implications of that here?

Do you happen to know why the pointer shouldn't be accessed outside the closure?

   public init(data : Data, cache : FBReaderCache? = FBReaderCache()) {
       self.count = data.count
       self.cache = cache
       var pointer : UnsafePointer<UInt8>! = nil
       data.withUnsafeBytes { (u8Ptr: UnsafePointer<UInt8>) in
           pointer = u8Ptr
       }
       self.buffer = UnsafeRawPointer(pointer)
   }

anohren avatar Jan 08 '17 09:01 anohren

Honest answer: I got this from StackOverflow ☺️.

Now a more constructive answer. public init(data : Data, cache : FBReaderCache? = FBReaderCache()) is a convenience initialiser for users, who have Data object on there hands. This should be 90% of the use cases for iOS developers. I am not deeply familiar with Data API so the solution listed above seemed to be rational to me. It also worked fine in unit tests.

Now I browsed through Data API again and found the copyBytes:to method. Using this API we could have following implementation:

private let originalBuffer : UnsafeMutableBufferPointer<UInt8>!

public init(data : Data, cache : FBReaderCache? = FBReaderCache()) {
    self.count = data.count
    self.cache = cache
        
    let pointer : UnsafeMutablePointer<UInt8> = UnsafeMutablePointer.allocate(capacity: data.count)
    self.originalBuffer = UnsafeMutableBufferPointer(start: pointer, count: data.count)
    _ = data.copyBytes(to: originalBuffer)
    self.buffer = UnsafeRawPointer(pointer)
}

public func freeMemory(){
    if let originalBuffer = originalBuffer,
        let pointer = originalBuffer.baseAddress {
        pointer.deinitialize(count: count)
    }
}

This implementation is safer and seems to solve the data ownership problem. In previous versions user had to keep a reference to data around if they wanted to use _Direct accessors. Now the Reader copies the data and keeps it's own reference originalBuffer around until user calls freemMemory. This creates a small issue. Copy makes it safe, but also a bit inefficient. A FlatBuffer can be up to 2GB big. in case of big buffers user probably want to avoid the copying. I guess in this case they have to use public init(buffer : UnsafeRawPointer, count : Int, cache : FBReaderCache? = FBReaderCache())

I will think about it a bit more, but I guess it would be a good enhancement, and I will have to change the code generator a bit.

Also I am not 100% sure about freeMemory I would put this code in deinit but FBMemoryReader is a struct. And structs can't have deinit. Here I am open to suggestions πŸ˜‰

mzaks avatar Jan 08 '17 14:01 mzaks

I have by no means used the Swift pointer API extensively, and I don't know this codebase, so with the risk of sounding like I don't know what I'm doing πŸ™‚:

  1. Can't we skip originalBuffer completely in your suggestion?
    let pointer = UnsafeMutablePointer<UInt8>.allocate(capacity: data.count)
    data.copyBytes(to: pointer, count: data.count)
    self.buffer = UnsafeRawPointer(pointer)
  1. And can't we use an UnsafeMutablePointer<UInt8> as self.buffer?

That would shorten the above to

    self.buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: data.count)
    data.copyBytes(to: self.buffer, count: data.count)

From glancing over the code I can only find one use of the fact that self.buffer is UnsafeRawPointer: in the method fromByteArray, for which I can find no usage at all.

The fromByteArray looks to me like it could suffer from alignment problems, by the way. Not sure about its intended usage though.

  1. Shouldn't we use deallocate in freeMemory?

  2. I think that if you send a 2 GB payload then you would have to know what you're doing, and not do it in memory constrained situations. Is that a common use case?

  3. Inefficiency of copying

From what I've gathered by asking on StackOverflow, the withUnsafeBytes already copies the data into contiguous memory. After all, I can't imagine how it would work otherwise. Bottom line, your proposal shouldn't be less efficient.

  1. I'm not sure about freeMemory - I don't know the responsibilities of FBMemoryReader. Is it unthinkable to make it a final class?

In general, if you'd add some comments to types and important functions it'd be interesting to look at.

anohren avatar Jan 09 '17 02:01 anohren

And to answer myself

2: Disregard. Seems that Github just didn't want to show me correct results when searching for "fromByteArray". Should have relied on Chrome search instead of Github's UI team ><. Also realized there is some black box code generation in this project.

Read up on flatbuffers and it seems they somehow accounts for memory alignment? Beats me how, I was led to believe they weren't invariant between architectures. But that's convenient!

anohren avatar Jan 09 '17 04:01 anohren

FWIW, I don't think it would be unreasonable to take a local copy (and free it accordingly) of the Data as this is a usability convenience initialiser - anyone caring about zero-copy and high performance would opt for the UnsafeRawPointer initialiser (even someone holding a Data object outside could always use that as well by using a data.withUnsafeBytes closure that encapsulates the whole usage of the FBMemoryReader if needed). The current code is basically unsafe though, good catch.

Also, that being said, "withUnsafeBytes" wouldn't need to copy necessarily if Data knows that the bytes are contiguous in memory, so I would not assume that is being done just based on SO.

Wrt to freeMemory(), shouldn't that simple be done in deinit{} ?

"Can't we skip originalBuffer completely in your suggestion?" One would want to keep track of whether we have allocated the buffer storage ourself and is responsible for releasing it (convenience Data initialiser) and to not dealloc if a raw pointer was used (common initaliser) - whether a boolean or separate pointer doesn't matter, but we would need to know so we only free the buffer for the convenience case.

"The fromByteArray looks to me like it could suffer from alignment problems" The base pointer for the whole flatbuffer would need to be aligned (to "x") and then the flatbuffer format itself guarantees that any subfields accessed are properly aligned - it is not truly invariant but a layout was chosen that should work across all commonly used architectures (see http://google.github.io/flatbuffers/flatbuffers_internals.html).

hassila avatar Jan 10 '17 11:01 hassila

@anohren Will respond point by point πŸ™‚

  1. The Data API doesn't work this way, it expects a BufferPointer which is probably for the better. I guess/hope that when you have a BufferPointer the memory region in this buffer is retained. This is also why we need to free it.
  2. In FlatBufferSwift for Swift2 I actually had it this way, because there was no RawPointer. And as an interesting side effect, I did not had to have full memory alignment. I could just do full bit packing, which worked fine on x64 and Arm64 architectures but had problems on Armv7 (32bit). RawPointer will not accept unaligned access on any architecture, so it is basically much safer. Little bit on background of memory alignment in FlatBuffers. When a buffer is built, it will dynamically figure out the alignment and add padding. This is the method which make sure that alignment works: https://github.com/mzaks/FlatBuffersSwift/blob/master/FlatBuffersSwift/FlatBufferBuilder.swift#L103
  3. To be hones I am a bit confused by the API I guess you are right we need to use deinitialise and than deallocate maybe just deallocate is also enough, but the documentation states that deallocate should be called on not initialised memory. If you have a good read on this topic to refer I would be grateful.
  4. I am using FlatBuffers also for storing state and also has this example (https://github.com/mzaks/FindCityFB) where I converted 180MB CSV file containing a list of all city names in the world to FlatBuffer and can search for cities by name or country under 4 milliseconds. There is a link to video in the read me file. So in this scenario copying 200MB files would be a bad idea. However I must admit I haven't ported this example to Swift3 yet.
  5. Also relates to question from @hassila. My performance benchmarks (https://github.com/mzaks/FlatBuffersSwiftPerformanceTest) shows that if FBMemoryReader is a final class it is a bit slower than if it is a struct. However I think it changed a bit with the latest Swift version. It is still a few milliseconds slower, but this is not that significant. In this case I can make FBMemoryReader a final class and than move the logic from freeMemory to deinit

Also @anohren if you are interested in how the builder and reader are used I have a couple of integration tests which work with generated classes: https://github.com/mzaks/FlatBuffersSwift/blob/master/FlatBuffersSwiftTests/contacts_fb.swift https://github.com/mzaks/FlatBuffersSwift/blob/master/FlatBuffersSwiftTests/friends_fb.swift

The first one is a complex example which use most of FlatBuffers features but is a hierarchically structured https://github.com/mzaks/FlatBuffersSwift/blob/master/FlatBuffersSwiftTests/contacts.fbs

The second one is an example of a complex cyclical graph. https://github.com/mzaks/FlatBuffersSwift/blob/master/FlatBuffersSwiftTests/friends.fbs

Documenting the API and also udtaing the Wiki is also on my list 😊

mzaks avatar Jan 10 '17 15:01 mzaks

Interesting stuff, thanks for replying.

Good and bad news is that the implementation of Data seems to just hand off a pointer to its backing data without any copying

    public func withUnsafeBytes<ResultType, ContentType>(_ body: (UnsafePointer<ContentType>) throws -> ResultType) rethrows -> ResultType {
        let bytes =  _backing.bytes ?? UnsafeRawPointer(bitPattern: 0xBAD0)!
        let contentPtr = bytes.bindMemory(to: ContentType.self, capacity: count / MemoryLayout<ContentType>.stride)
        return try body(contentPtr)
    }

So it might not break...yet.

  1. The code I posted actually compiles, so the API has to partly work that way :) at least with regards to pointer type (which is not a BufferPointer).

  2. I think that deinitialization is not necessary since there are no nontrivial types residing anywhere in the allocated space, i.e. no possibility of memory that needs to be freed indirectly due to refcount. That's just intuition -- I can't say I know about the implementation details of Swift.

I can recommend that document for further reading though. It covers most topics.

At first I thought strings constructed with String.init(bytesNoCopy: ... would complain if freeing the memory underneath them, but I didn't manage to break it in playground. I'm starting to doubt I understand the meaning of "NoCopy".

  1. I think @hassila was onto something there. Let the user decide to be unsafe if she wants to, and you can just rename this parameter to copyBytesOf data: Data or something that makes this behavior clear.

anohren avatar Jan 10 '17 21:01 anohren

  1. I was probably unclear, I never advocated supporting un unsafe interface - just pointing out that someone having a Data object and wanting to avoid a copy from our convenience interface (the Data based initialiser) could always use the standard no-copy pointer initialiser by putting all relevant code on the call site that access the reader within data.withUnsafeBytes {} (then it would still be safe).

Another alternative would be to change the reader implementation to always use data.withUnsafeBytes instead of directly accessing the stored pointer, then it would be safe and no copy would be needed, but it could have performance implications if not done carefully and would probably need some measuring...

hassila avatar Jan 11 '17 08:01 hassila

Nope, I was unclear. I agree with you, and proposed to rename this init parameter to copyBytesOf data: Data to make your behavior clear.

anohren avatar Jan 11 '17 08:01 anohren