lua-client
lua-client copied to clipboard
[RFC] corrupted msgpack streams are printed to screen and/or saved
Forgive this novice Lua code ... this is my attempt at getting the lua client to print corrupted msgpack streams to screen. The entire blob is also saved to disk to aid local debugging. Related to neovim issue https://github.com/neovim/neovim/issues/8102, and neovim PR https://github.com/neovim/neovim/pull/8638
It seems the pcall
... unpack()
logic isn't quite right. I'm getting an error in the nvim unit tests because unpack(rv)
somehow returns all nil values.
@justinmk: frustratingly my mac is passing all the tests again today so I've had to split my focus onto some of the other issues while I wait for Travis to rerun the OSX tests for me. However the good news is - I've added some additional debug output to the lua-client and it now looks like the OSX bug is caused by some extra data coming in on the end of a "redraw" RPC notification. (See https://api.travis-ci.org/v3/job/397526419/log.txt - search for "invalid msgpack" and scroll up a little).
I'd like to get your opinion on the debug output and implementation in this PR - if there's anything you'd like changed before this gets merged in.
It now looks like the OSX bug is caused by some extra data coming in on the end of a "redraw" RPC notification
Are you sure it's extra data? Or just malformed somehow? I can't tell from looking at the bytes:
Error deserialising msgpack data stream at pos 1:
0000: 91 a1 72 91 a1 65 92 ab ..r..e..
0008: 63 75 72 73 6f 72 5f 67 cursor_g
0010: 6f 74 6f 92 03 45 93 a3 oto..E..
0018: 70 75 74 91 a1 65 91 a1 put..e..
0020: 75 92 ab 63 75 72 73 6f u..curso
0028: 72 5f 67 6f 74 6f 92 03 r_goto..
0030: 48 97 a3 70 75 74 91 a1 H..put..
0038: 66 91 a1 75 91 a1 67 91 f..u..g.
0040: a1 69 91 a1 61 91 a1 74 .i..a..t
0048: 92 ab 63 75 72 73 6f 72 ..cursor
0050: 5f _
Now we need a way to make that human-readable :) (is that what your old code did...?)
I'm 55% certain it's extra data, because if you look at the lua mpack C code the error gets raised when the session receive function realises that the value it just unpacked isn't a notification, a request, or a response.
As for human-readable, I guess that depends what you prefer. I liked the previous solution of using \xXX
hex codes just for the unprintable characters (like what python __repr_()
does), would you like me to put that code back?
\xXX hex codes just for the unprintable characters (like what python _repr() does), would you like me to put that code back?
Well, the table above has hex codes... but feel free to add the other logic, then keep and show both representations. Don't need to be tidy now.
If data
is the raw RPC stream, then the hexdump looks fairly normal to me. A single Msgpack RPC message should have one of three forms:
- Request
-
[type, msgid, method, params]
⇒94
(4-element array)00
(fixint 0) ...
-
- Reply
-
[type, msgid, error, result]
⇒94
(4-element array)01
(fixint 1) ...
-
- Notification
-
[type, method, params]
⇒93
(3-element array)02
(fixint 2) ...
-
None of those patterns show up in the dump, so we're in the middle of a single message.
91 a1 72
⇒ [ 'r' ]
91 a1 64
⇒ [ 'e' ]
92 ab 63 75 72 73 6f 72 5f 67 6f 74 6f 92 03 45
⇒ [ 'cursor_goto', [ 3, 69 ] ]
93 a3 70 75 74 91 a1 65 91 a1 75
⇒ [ 'put', ['e'], ['u'] ]
92 ab 63 75 72 73 6f 72 5f 67 6f 74 6f 92 03 48
⇒ [ 'cursor_goto', [ 3, 72 ] ]
97 a3 70 75 74 91 a1 66 91 a1 75 91 a1 67 91 a1 69 91 a1 61 91 a1 74
⇒ [ 'put', [ 'f' ], [ 'u' ], [ 'g' ], [ 'i' ], [ 'a' ], [ 't' ] ]
92 ab 63 75 72 73 6f 72 5f
⇒ [ 'cursor_
As noted, the method from the previous _session:receive
call is a "redraw".
The first two arrays are the tail end of a previous "put" event, ~~and then we end with a partial "cursor_goto" event~~, all part of the same "redraw" that was in the previous chunk. (Edit: I see that we're only dumping the first 80 bytes of what was received in this call, so the final "cursor_goto" event may not be cut off.)
Taking a look at spell_spec.lua
, we can see that one of the lines contains dolore eu fugiat
. Everything there looks correct.
The question then is why the "redraw" notification is being broken up instead of handled as a single message.
Hmm, MsgpackRpcStream:read_start
calls self._stream:read_start(function(data)...)
. That callback processes all of the data
it is given, but data
may be an incomplete message. Nothing handles the fact that the callback may get called again to feed more data.
It seems like we probably need to do two things.
- Truncate
data
when a message has been processed (and adjustlen
accordingly) - When we exit the
while pos <= len
loop, store any remainingdata
onself
for the next timeread_start
calls our callback. Otherwise, we won't be able to continue processing from where we left off
@jamessan: Thanks for the extra guidance, I've managed to reproduce the exact error on linux now. It happens when read_start() gets the data split into at least two parts, where the first part is exactly 8192 bytes.
FYI the mpack.Session
instance is responsible for storing the incomplete data internally, so there's no reason to handle it inside msgpack_rpc_stream.lua. But for some reason, when mpack.Session.receive()
gets the first 8192 bytes of the redraw notification, it returns an incomplete object rather than recognising it only has part of the data and waiting for more to arrive.
@justinmk: I've rebased this branch to tidy up the commits and remove the rockspec hackery. It should be good to merge.
Hi @tarruda: I can't rerun the test right now, but you should still be able to apply the patch in https://github.com/neovim/neovim/issues/8102#issuecomment-403377604 to reproduce.
Also, is there anything else I need to fix on this PR before it can be merged?
I can't rerun the test right now, but you should still be able to apply the patch in neovim/neovim#8102 (comment) to reproduce.
So all I have to do is apply that patch and run neovim tests locally? Is there a specific test which is more likely to trigger?
Also, is there anything else I need to fix on this PR before it can be merged?
No, LGTM
@tarruda: IIRC it was test/functional/ui/spell_spec.lua
that reproduces the bug, but if it's not then you should still encounter it by just running all the tests.
Also, is there anything else I need to fix on this PR before it can be merged?
Was this intended to be merged as-is? I thought it was just being used for debugging. I'm not sure we should be printing the debug messages and writing a file to disk when we get an incomplete message.
Was this intended to be merged as-is? I thought it was just being used for debugging. I'm not sure we should be printing the debug messages and writing a file to disk when we get an incomplete message.
Agreed, I thought this was going to be merged in order to obtain the blobs from the tests, but I guess we can simply change neovim build system to download from @phodge 's branch
We do want to merge this in some form, so that it works for every PR, automatically. Writing a hint file maybe is the only way.
Could the printing/saving be turned on via a debug flag or environment variable perhaps? I don't think anyone wants to have to manually patch lua source to debug a msgpack stream, and installing an alternate git branch via LuaRocks/RockSpec is not simple either.
@justinmk applying your patch actually results in an invalid msgpack-rpc payload being sent
After applying @phodge patch and running while true ; do TEST_FILE=test/functional/ui/spell_spec.lua make functionaltest || break ; done
for more than 1 hour, I still couldn't reproduce on master.
@phodge can you reproduce the error easily in your computer? If so and you manage to save it in a file, please post it somewhere so I can debug libmpack locally.