lua-client icon indicating copy to clipboard operation
lua-client copied to clipboard

[RFC] corrupted msgpack streams are printed to screen and/or saved

Open phodge opened this issue 6 years ago • 18 comments

Forgive this novice Lua code ... this is my attempt at getting the lua client to print corrupted msgpack streams to screen. The entire blob is also saved to disk to aid local debugging. Related to neovim issue https://github.com/neovim/neovim/issues/8102, and neovim PR https://github.com/neovim/neovim/pull/8638

phodge avatar Jun 25 '18 04:06 phodge

It seems the pcall ... unpack() logic isn't quite right. I'm getting an error in the nvim unit tests because unpack(rv) somehow returns all nil values.

phodge avatar Jun 25 '18 21:06 phodge

@justinmk: frustratingly my mac is passing all the tests again today so I've had to split my focus onto some of the other issues while I wait for Travis to rerun the OSX tests for me. However the good news is - I've added some additional debug output to the lua-client and it now looks like the OSX bug is caused by some extra data coming in on the end of a "redraw" RPC notification. (See https://api.travis-ci.org/v3/job/397526419/log.txt - search for "invalid msgpack" and scroll up a little).

I'd like to get your opinion on the debug output and implementation in this PR - if there's anything you'd like changed before this gets merged in.

phodge avatar Jun 28 '18 11:06 phodge

It now looks like the OSX bug is caused by some extra data coming in on the end of a "redraw" RPC notification

Are you sure it's extra data? Or just malformed somehow? I can't tell from looking at the bytes:

Error deserialising msgpack data stream at pos 1:
0000: 91 a1 72 91 a1 65 92 ab ..r..e..
0008: 63 75 72 73 6f 72 5f 67 cursor_g
0010: 6f 74 6f 92 03 45 93 a3 oto..E..
0018: 70 75 74 91 a1 65 91 a1 put..e..
0020: 75 92 ab 63 75 72 73 6f u..curso
0028: 72 5f 67 6f 74 6f 92 03 r_goto..
0030: 48 97 a3 70 75 74 91 a1 H..put..
0038: 66 91 a1 75 91 a1 67 91 f..u..g.
0040: a1 69 91 a1 61 91 a1 74 .i..a..t
0048: 92 ab 63 75 72 73 6f 72 ..cursor
0050: 5f                      _

Now we need a way to make that human-readable :) (is that what your old code did...?)

justinmk avatar Jun 28 '18 11:06 justinmk

I'm 55% certain it's extra data, because if you look at the lua mpack C code the error gets raised when the session receive function realises that the value it just unpacked isn't a notification, a request, or a response.

As for human-readable, I guess that depends what you prefer. I liked the previous solution of using \xXX hex codes just for the unprintable characters (like what python __repr_() does), would you like me to put that code back?

phodge avatar Jun 28 '18 11:06 phodge

\xXX hex codes just for the unprintable characters (like what python _repr() does), would you like me to put that code back?

Well, the table above has hex codes... but feel free to add the other logic, then keep and show both representations. Don't need to be tidy now.

justinmk avatar Jun 28 '18 11:06 justinmk

If data is the raw RPC stream, then the hexdump looks fairly normal to me. A single Msgpack RPC message should have one of three forms:

  • Request
    • [type, msgid, method, params]94 (4-element array) 00 (fixint 0) ...
  • Reply
    • [type, msgid, error, result]94 (4-element array) 01 (fixint 1) ...
  • Notification
    • [type, method, params]93 (3-element array) 02 (fixint 2) ...

None of those patterns show up in the dump, so we're in the middle of a single message.

91 a1 72[ 'r' ] 91 a1 64[ 'e' ] 92 ab 63 75 72 73 6f 72 5f 67 6f 74 6f 92 03 45[ 'cursor_goto', [ 3, 69 ] ] 93 a3 70 75 74 91 a1 65 91 a1 75[ 'put', ['e'], ['u'] ] 92 ab 63 75 72 73 6f 72 5f 67 6f 74 6f 92 03 48[ 'cursor_goto', [ 3, 72 ] ] 97 a3 70 75 74 91 a1 66 91 a1 75 91 a1 67 91 a1 69 91 a1 61 91 a1 74[ 'put', [ 'f' ], [ 'u' ], [ 'g' ], [ 'i' ], [ 'a' ], [ 't' ] ] 92 ab 63 75 72 73 6f 72 5f[ 'cursor_

As noted, the method from the previous _session:receive call is a "redraw".

The first two arrays are the tail end of a previous "put" event, ~~and then we end with a partial "cursor_goto" event~~, all part of the same "redraw" that was in the previous chunk. (Edit: I see that we're only dumping the first 80 bytes of what was received in this call, so the final "cursor_goto" event may not be cut off.)

Taking a look at spell_spec.lua, we can see that one of the lines contains dolore eu fugiat. Everything there looks correct.

The question then is why the "redraw" notification is being broken up instead of handled as a single message.

jamessan avatar Jun 30 '18 01:06 jamessan

Hmm, MsgpackRpcStream:read_start calls self._stream:read_start(function(data)...). That callback processes all of the data it is given, but data may be an incomplete message. Nothing handles the fact that the callback may get called again to feed more data.

It seems like we probably need to do two things.

  • Truncate data when a message has been processed (and adjust len accordingly)
  • When we exit the while pos <= len loop, store any remaining data on self for the next time read_start calls our callback. Otherwise, we won't be able to continue processing from where we left off

jamessan avatar Jun 30 '18 02:06 jamessan

@jamessan: Thanks for the extra guidance, I've managed to reproduce the exact error on linux now. It happens when read_start() gets the data split into at least two parts, where the first part is exactly 8192 bytes.

FYI the mpack.Session instance is responsible for storing the incomplete data internally, so there's no reason to handle it inside msgpack_rpc_stream.lua. But for some reason, when mpack.Session.receive() gets the first 8192 bytes of the redraw notification, it returns an incomplete object rather than recognising it only has part of the data and waiting for more to arrive.

phodge avatar Jul 03 '18 06:07 phodge

@justinmk: I've rebased this branch to tidy up the commits and remove the rockspec hackery. It should be good to merge.

phodge avatar Jul 09 '18 06:07 phodge

Hi @tarruda: I can't rerun the test right now, but you should still be able to apply the patch in https://github.com/neovim/neovim/issues/8102#issuecomment-403377604 to reproduce.

phodge avatar Aug 27 '18 23:08 phodge

Also, is there anything else I need to fix on this PR before it can be merged?

phodge avatar Aug 27 '18 23:08 phodge

I can't rerun the test right now, but you should still be able to apply the patch in neovim/neovim#8102 (comment) to reproduce.

So all I have to do is apply that patch and run neovim tests locally? Is there a specific test which is more likely to trigger?

Also, is there anything else I need to fix on this PR before it can be merged?

No, LGTM

tarruda avatar Aug 28 '18 00:08 tarruda

@tarruda: IIRC it was test/functional/ui/spell_spec.lua that reproduces the bug, but if it's not then you should still encounter it by just running all the tests.

phodge avatar Aug 28 '18 01:08 phodge

Also, is there anything else I need to fix on this PR before it can be merged?

Was this intended to be merged as-is? I thought it was just being used for debugging. I'm not sure we should be printing the debug messages and writing a file to disk when we get an incomplete message.

jamessan avatar Aug 28 '18 02:08 jamessan

Was this intended to be merged as-is? I thought it was just being used for debugging. I'm not sure we should be printing the debug messages and writing a file to disk when we get an incomplete message.

Agreed, I thought this was going to be merged in order to obtain the blobs from the tests, but I guess we can simply change neovim build system to download from @phodge 's branch

tarruda avatar Aug 28 '18 09:08 tarruda

We do want to merge this in some form, so that it works for every PR, automatically. Writing a hint file maybe is the only way.

justinmk avatar Aug 28 '18 10:08 justinmk

Could the printing/saving be turned on via a debug flag or environment variable perhaps? I don't think anyone wants to have to manually patch lua source to debug a msgpack stream, and installing an alternate git branch via LuaRocks/RockSpec is not simple either.

phodge avatar Aug 28 '18 12:08 phodge

@justinmk applying your patch actually results in an invalid msgpack-rpc payload being sent

After applying @phodge patch and running while true ; do TEST_FILE=test/functional/ui/spell_spec.lua make functionaltest || break ; done for more than 1 hour, I still couldn't reproduce on master.

@phodge can you reproduce the error easily in your computer? If so and you manage to save it in a file, please post it somewhere so I can debug libmpack locally.

tarruda avatar Aug 28 '18 13:08 tarruda