aws-appsync-community icon indicating copy to clipboard operation
aws-appsync-community copied to clipboard

RealTime WebSocket framing issue: frame ends with invalid UTF-8 character

Open LanderN opened this issue 1 year ago • 0 comments

Hello! I'm using AppSync RealTime over WebSocket connections and I'm experiencing intermittent problems that cause the WebSocket connection to break down and results in loss of messages.

I'm experiencing this problem in a Qt program (using Qt WebSockets).

Messages may contain non-ASCII/multi-byte UTF-8 characters and may be large enough such that the AppSync WebSocket server seems to decide to split the message into multiple frames. The problem seems to be incorrect framing of WebSocket messages from the AppSync RealTime endpoint: an individual frame ends with an incomplete UTF-8 character, causing a decoding error on the client (in the Qt WebSocket code).

I believe this is an issue that should be fixed in the AppSync server implementation, since this problem means the AppSync WebSocket server does not correctly implement the WebSocket RFC (6455 https://www.rfc-editor.org/rfc/rfc6455#section-1.2): it specifies that a text frame should always contain valid UTF-8.

If it helps, here is a screenshot of the point inside the Qt WebSocket implementation where the frame decoding fails: image

Memory dump of frame.payload(): {"id":"<redacted>","type":"data","payload":{"data":{"subscribeToDeviceReplies":{"mac":"<redacted>","type":"reply","data":"\"{\\\"id\\\":\\\"<redacted>\\\",\\\"list\\\":[{\\\"description\\\":\\\"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\\",\\\"id\\\":0,\\\"name\\\":\\\"Lĺáàäsŝśáàänñóòö\\\",\\\"timestamp\\\":1662384560,\\\"uuid\\\":\\\"e84b1e81-a4d3-4b62-ab21-c1c397558622\\\"},{\\\"description\\\":\\\"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\\",\\\"id\\\":1,\\\"name\\\":\\\"Lĺáàäsŝśáàänñóòö\\\",\\\"timestamp\\\":1662384560,\\\"uuid\\\":\\\"e84b1e81-a4d3-4b62-ab21-c1c397558622\\\"},{\\\"description\\\":\\\"aaaaaa\\\",\\\"id\\\":2,\\\"name\\\":\\\"Lĺáàäsŝśáàänñóòö\\\",\\\"timestamp\\\":1662384500,\\\"uuid\\\":\\\"e84b1e81-a4d3-4b62-ab21-c1c397558622\\\"},{\\\"description\\\":\\\"aaaaaa\\\",\\\"id\\\":3,\\\"name\\\":\\\"Lĺáàäsŝśáàänñóòö\\\",\\\"timestamp\\\":1662384500,\\\"uuid\\\":\\\"e84b1e81-a4d3-4b62-ab21-c1c397558622\\\"},{\\\"description\\\":\\\"aaaaaaaaaaa\\\",\\\"id\\\":4,\\\"name\\\":\\\"Lĺáàäsŝśáàänñóòö\\\",\\\"timestamp\\\":1662384446,\\\"uuid\\\":\\\"e84b1e81-a4d3-4b62-ab21-c1c397558622\\\"},{\\\"description\\\":\\\"aaaaaaaaaaa\\\",\\\"id\\\":5,\\\"name\\\":\\\"Lĺáàäsŝśáàänñóòö\\\",\\\"timestamp\\\":1662384446,\\\"uuid\\\":\\\"e84b1e81-a4d3-4b62-ab21-c1c397558622\\\"},{\\\"description\\\":\\\"blaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\\",\\\"id\\\":6,\\\"name\\\":\\\"Lĺáàäsŝśáàänñóòö\\\",\\\"timestamp\\\":1662384413,\\\"uuid\\\":\\\"e84b1e81-a4d3-4b62-ab21-c1c397558622\\\"},{\\\"description\\\":\\\"blaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\\",\\\"id\\\":7,\\\"name\\\":\\\"Lĺ�

image

Notice how the very last character is the invalid character C3, the first byte of the multi-byte á (hex: C3 A1). A UTF-8 multi-byte character should never be split.

Hopefully this can be fixed! Please let me know if I was unclear or if you need any additional information.

LanderN avatar Sep 05 '22 14:09 LanderN