garnet icon indicating copy to clipboard operation
garnet copied to clipboard

Refactor: Separate RESP Command Parsing and Execution

Open lmaas opened this issue 1 year ago • 5 comments

This PR is a first step towards a more maintainable RESP parser design.

Overview

This PR tries to separate command parsing and execution by making the following changes:

  • All parsing has been unified in a single ParseCommand() function with clear semantics. If parsing is successful, the function returns the RespCommand ID, optionally a subcommand ID (if parsed) and the number of remaining elements in the RESP array (i.e., the number of command arguments)
  • Every command has been assigned an ID in the RespCommand enum.

Additionally, the parsing code is now enforcing the following semantics on the parsing function and all operators:

  1. Parsing advances the read head to the first string after the parsed command and (optional) subcommand
  2. All operators read from the read head to remove inconsistency between reading from pointers and read head.
  3. count represents the number of arguments remaining in the current RESP package.

Separating parsing and command execution will allow easier integration with components such as ACL and easier replacement of the parsing logic. In addition, it allows us to properly identify any potential bottlenecks in the current parser.

Known Caveats

  • Currently all parsing functions return (RespCommand, byte), but most commands still do their own subcommand parsing inside the command executors. This will change in the future with new parser updates.
  • There are inconsistencies in the interfaces of different command executors. Eventually we will need to converge to a standardized interface.

lmaas avatar Mar 27 '24 23:03 lmaas

When working on #119 I ran to the problem that the heavy usage of MemoryMarshal.Read would hit the inlining budget where the JIT refuses to inline anymore methods no matter what (It starts to even ignore AggressiveInlining) at the end of the method. I worked around this by implementing simpler (and more unsafe) version of the method to reduce the IL size and folding the JIT needs to do. These limits are increased on .NET 8 compared to .NET 6. Did you happen to check the assembly that the JIT actually inlines all these calls?

aromaa avatar Mar 28 '24 01:03 aromaa

Hi @aromaa, we really appreciate your expertise with JIT and inlining to help make this work more optimized, thanks!

badrishc avatar Mar 28 '24 01:03 badrishc

When working on #119 I ran to the problem that the heavy usage of MemoryMarshal.Read would hit the inlining budget where the JIT refuses to inline anymore methods no matter what (It starts to even ignore AggressiveInlining) at the end of the method. I worked around this by implementing simpler (and more unsafe) version of the method to reduce the IL size and folding the JIT needs to do. These limits are increased on .NET 8 compared to .NET 6. Did you happen to check the assembly that the JIT actually inlines all these calls?

This is great insight! We have not checked yet if the inlining is happening correctly. This should definitely go on our to-do list.

I think we'll need a few optimization passes on this - especially on the core loop. So far, we have not tuned the parsing loop much and I am sure there is a lot of room for improvement. We'd definitely appreciate your help with this!

lmaas avatar Mar 28 '24 01:03 lmaas

When working on #119 I ran to the problem that the heavy usage of MemoryMarshal.Read would hit the inlining budget where the JIT refuses to inline anymore methods no matter what (It starts to even ignore AggressiveInlining) at the end of the method. I worked around this by implementing simpler (and more unsafe) version of the method to reduce the IL size and folding the JIT needs to do. These limits are increased on .NET 8 compared to .NET 6. Did you happen to check the assembly that the JIT actually inlines all these calls?

How do you check the assembly for this, any pointers?

badrishc avatar Mar 28 '24 04:03 badrishc

How do you check the assembly for this, any pointers?

I recommend using excellent VS extension by member of the .NET JIT team, called Disasmo. warning: it's pretty addicting

https://github.com/microsoft/garnet/assets/26825431/a39f8c43-23bc-4743-9ff9-3a5296405247

I believe you need to clone and build the .NET runtime locally in order to use the print-inlinees feature to see the JIT inlining decisions. You can also toggle JitDump -option to see every stage of JIT compilation, it's pretty fascinating.

If VS is not available for you you can follow .NET Runtime guide to try getting the basic JitDisasm manually using dotnet run with DOTNET_JitDisasm=MethodName environment variables. However to get the FullOpts output (like in Disasmo), you have to wait for Tier1 to kick in.

As other JIT developers often say, developer should only mark methods with MethodImpl.AggressiveInlining rarely and for good reason as the JIT should ideally be doing the right thing™️ already with its heuristics. By using AggressiveInlining, you eat from the JITs inlining and other optimization budget.

It's very hard to find the right balance between JIT throughput, code size and the resulting inlined assembly which depend very much on callsites and other factors, which is a probably why people experienced in optimizing low-level stuff like this keep telling to measure and not assume something is faster.

Here's a random example (there's more) where removing AgressiveInlining made things much faster😅: https://github.com/dotnet/runtime/pull/81565

PaulusParssinen avatar Mar 28 '24 08:03 PaulusParssinen