protobuf-swift icon indicating copy to clipboard operation
protobuf-swift copied to clipboard

Problem parsing large messages (i.e. 1MB)

Open gegles opened this issue 9 years ago • 10 comments

Version of protoc (protoc --version)

3.0

Version of ProtocolBuffers.framework

ProtoBuf3.0-Swift3.0 branch (but most likely all the branches)

.proto file to reproduce

message Data {
    uint64              offset              = 1;
    bool                last                = 2;
    bytes               data                = 3;
}

Description

If this input.read is not able to read the full rSize of the message but only a subset, the function still returns a message with only the partial data.

At a minimum, this should give an error or a way for users to detect the issue, at best, a mechanism to progressively build the message (in multiple reads) should be provided.

Let me know I am completely off here or if more details are needed.

gegles avatar Jul 13 '16 02:07 gegles

I need some details about this. How do you use this method? Do you have UnitTest or example?

alexeyxo avatar Jul 13 '16 18:07 alexeyxo

I will try to give you more details or spend time building a UT for it.

In the meantime, understand that the problem is very localized to the input.read call I am referring to.

Since you do not check/use the return value (the actual number of bytes read or 0 (EOF) or -1 (error)), you basically assume input.read will always successfully read the full "maxLength" number of bytes. That is not always the case as explained in the InputStream.read() doc.

gegles avatar Jul 13 '16 18:07 gegles

Can you try last version of protobuf-swift repository?

alexeyxo avatar Jul 21 '16 09:07 alexeyxo

Thanks for bring the codebase up to speed with Swift 3.0, I've already moved to it.

I haven't yet tried with large messages, I will soon, but I don't see how anything could be different, since the return value of inputStream.read() is still not being checked.

gegles avatar Jul 23 '16 18:07 gegles

You talk about "length-delimited encoding/decoding"... If your file/stream has a correct format, then all will be ok. Please look test case:

    func testDelimitedEncodingEncoding()
    {
         do {
            let str =  (NSBundle(forClass:TestUtilities.self).resourcePath! as NSString).stringByAppendingPathComponent("delimitedFile.dat")
            let stream = NSOutputStream(toFileAtPath: str, append: false)
            stream?.open()
            for i in 1...100 {
                let mes = try PBProtoPoint.Builder().setLatitude(1.0).setLongitude(Float(i)).build()
                try mes.writeDelimitedToOutputStream(stream!)
            }
            stream?.close()

            let input = NSInputStream(fileAtPath: str)
            input?.open()
            let message = try PBProtoPoint.parseArrayDelimitedFromInputStream(input!)
            XCTAssertTrue(message[1].longitude == 2.0, "")
        }
        catch
        {
            XCTFail("Fail testDelimitedEncodingEncoding")
        }
    }

I compare return value with "!= 1", then return nil(== Invalid ProtocolBuffers stream) value.

alexeyxo avatar Jul 25 '16 13:07 alexeyxo

Alex,

Maybe my understanding is still off, but I do think there is a clear problem here. I think your test is only succeeding because your file-based InputStream is able to keep up and always read the "maxLength" requested fully...

Let me try to describe it better.

I have a very simple "Data" message (simplified here):

message DataMessage {
    bytes               data                = 1;
}

The code I use to read/receive this message is like this:

    func processMessage() {
        do {
            if let msg = try DataMessage.parseDelimitedFrom(inputStream: inputStream!) {
                onMessage?(msg)
            }
            print("No more message for now")
        } catch {
            print("Parsing error: \(error)")
        }
    }

Important: My InputStream is a TCP connection here (Not a filesystem-based IO)

When the message/data size is relatively small (1K to 30K), things work just fine....

When the size gets bigger, here is what happens,

  1. A Message is still produced by parseDelimitedFrom but the data is clearly not fully there.

  2. If I modify your AbstractMessage.mergeDelimitedFrom() function like this:

    //Delimited Encoding/Decoding
    public func mergeDelimitedFrom(inputStream: InputStream) throws -> Self? {
        var firstByte:UInt8 = 0
        if inputStream.read(&firstByte, maxLength: 1) != 1 {
            return nil
        }
        let rSize = try CodedInputStream.readRawVarint32(firstByte: firstByte, inputStream: inputStream)
        let data  = Data(bytes: [0],count: Int(rSize))
        let pointer = UnsafeMutablePointer<UInt8>((data as NSData).bytes)

    // Original
    // inputStream.read(pointer, maxLength: Int(rSize))

        // Now just printing the return value if different then rSize
    let actuallyReadSize = inputStream.read(pointer, maxLength: Int(rSize))

    if (actuallyReadSize != Int(rSize)) {
        print("DANGER: actuallyReadSize(\(actuallyReadSize)) != rSize(\(rSize))")
    }
        return try mergeFrom(data: data)
    }

I do see the following printout: DANGER: actuallyReadSize(400729) != rSize(1048594)

  1. Subsequent message parsing is screwed up as the current inputStream is still receiving data from the previous message.....

Yes, clearly from this, you do the proper checking "!= 1" for the "firstByte" reading. No issue with that.

The problem is that you do no such check on the second inputStream.read() and there is no guarantee that the full data can be read (in one shot)..... My understanding is that InputStream.read() is a non-blocking call that will only return what it has read (up to maxLength).

Let me know if this makes it clearer or if I am still missing something. I will see how I can come up with a solution, but I don't know your codebase too well, so it may take some time.

Cheers. G.

gegles avatar Jul 25 '16 19:07 gegles

@alexeyxo could you just confirm to me that you too consider this as a bug?

gegles avatar Jul 29 '16 21:07 gegles

FYI, I still "know" that there is a bug here, but for now, instead of decoding the message directly from the socket InputStream, I first make sure I read all the data for the message (into a Data buffer) and only then use CodedInputStream....

gegles avatar Aug 17 '16 17:08 gegles

Hi @gegles How can you read data into Data buffer?

Kind regards,

thandang avatar Aug 25 '16 12:08 thandang

I had tried something like `let rawInput:NSInputStream = NSInputStream(data: dataResponse) rawInput.open()

        var buffer = [UInt8](count: dataResponse.length, repeatedValue: 0)

        while rawInput.hasBytesAvailable {
            let len = rawInput.read(&buffer, maxLength: buffer.count)
            if len > 0 {
                let outData = NSData(bytes: &buffer, length: buffer.count)
                let codedInputStream = CodedInputStream(data: outData)

                let lengthRaw = try codedInputStream.readRawVarint32()
                let lengthThrough = try codedInputStream.pushLimit(lengthRaw)
                protobufMessage = try Com.Celertech.Baseserver.Communication.Protobuf.ProtobufMessage.parseFromCodedInputStream(codedInputStream)
                codedInputStream.popLimit(lengthThrough)
            }
        }`

But still like it doesn't work. The protobufMessage sill be nil

Do you have any idea? Cheers.

thandang avatar Aug 25 '16 14:08 thandang