garnet Cluster Command Parsing Refactor

This PR addresses regressions due to changes in the semantics of DrainCommand and to adhere to the new parsing restrictions. It refactors ClusterCommands.cs into multiple files similar to how other Network commands are implemented. The PR contains the following enchantments

I separated cluster subcommand parsing from execution.
I added explicit argument checks for each cluster command.
I created a test to validate the functionaliyt of (2.).
I created a test to send AddSlots into chunks and ensure that session is not disposed while waiting for more data.

Some open questions

I used MemoryExtensions.SequenceEqual because I read somewhere that it offers better performance ReadOnlySpan<T>.SequenceEqual. I could benchmark it or if someone can chime in that will be great.
In general, cluster commands are in the slow path, so I am wondering if we should use MakeUpperCase to avoid having two versions of each command type when parsing param byte sequence.

May 10 '24 22:05 vazois

I used MemoryExtensions.SequenceEqual because I read somewhere that it offers better performance ReadOnlySpan.SequenceEqual. I could benchmark it or if someone can chime in that will be great.

They're the same method. MemoryExtensions.SequenceEqual(this ReadOnlySpan<T> a, ReadOnlySpan<T> b) is the extension method that gets binded when you do span.SequenceEqual(other).

There is LINQ Enumerable.SequenceEqual which won't get binded to spans, but you can accidentally bind it to anything that is IEnumerable 😅

In general, cluster commands are in the slow path, so I am wondering if we should use MakeUpperCase to avoid having two versions of each command type when parsing param byte sequence.

IMO yes. And in the cases where we want don't want to try help with a fast path (IIRC I think it was OperationDirection or something that had first fast-path check for already uppercase and then MakerUpperCase + SequenceEquals again) we could have a AsciiUtils.EqualsIgnoreCase helper which does this pattern for us.

out of topic, sorta: MakeUpperCase is potentially dangerous. It has no length limit so it'll keep scanning potentially out of bounds and output the client anything that it got to. Current use of it is okay-ish, as long as nothing goes wrong (I haven't tried to craft malicious payload to it, so I think that method should have some limit..) 😅

edit: Found the place where we have "fast-path" for upper case (if we know to prioritize it for some reason) https://github.com/microsoft/garnet/blob/8856dc3990fb0863141cb902bbf64c13202d5f85/libs/server/Resp/Objects/ObjectStoreUtils.cs#L75-L89

We could have something like:

namespace Garnet.Common;

public static class AsciiUtils
{
    public static bool EqualsIgnoreCase(byte* start, byte* end /* (or length) */, ReadOnlySpan<byte> other)
    {
        // get the ptrs
        MakeUpperCase(start, end /* so we don't cause a heartbleed 2.0 */); // We expect "other" to be already upper case AND constant because we control it.
        return span.SequenceEquals(other);
    }
}

but I guess in this case where all the comparisons are in the same method, it is worth it to have just one MakeUpperCase and then do comparisons against the constants.

May 10 '24 22:05 PaulusParssinen

I used MemoryExtensions.SequenceEqual because I read somewhere that it offers better performance ReadOnlySpan.SequenceEqual. I could benchmark it or if someone can chime in that will be great.

They're the same method. MemoryExtensions.SequenceEqual(this ReadOnlySpan<T> a, ReadOnlySpan<T> b) is the extension method that gets binded when you do span.SequenceEqual(other).

There is LINQ Enumerable.SequenceEqual which won't get binded to spans, but you can accidentally bind it to anything that is IEnumerable 😅

Thanks for the clarification. Are you saying span.SequenceEnqual is more readable? I don't have a personal preference. I pushed a new commit where I follow the pattern used in ParseCommand (i.e. MemoryMarshal.Read)

In general, cluster commands are in the slow path, so I am wondering if we should use MakeUpperCase to avoid having two versions of each command type when parsing param byte sequence.

IMO yes. And in the cases where we want don't want to try help with a fast path (IIRC I think it was OperationDirection or something that had first fast-path check for already uppercase and then MakerUpperCase + SequenceEquals again) we could have a AsciiUtils.EqualsIgnoreCase helper which does this pattern for us.

out of topic, sorta: MakeUpperCase is potentially dangerous. It has no length limit so it'll keep scanning potentially out of bounds and output the client anything that it got to. Current use of it is okay-ish, as long as nothing goes wrong (I haven't tried to craft malicious payload to it, so I think that method should have some limit..) 😅

edit: Found the place where we have "fast-path" for upper case (if we know to prioritize it for some reason)

https://github.com/microsoft/garnet/blob/8856dc3990fb0863141cb902bbf64c13202d5f85/libs/server/Resp/Objects/ObjectStoreUtils.cs#L75-L89

We could have something like:
namespace Garnet.Common;

public static class AsciiUtils
{
    public static bool EqualsIgnoreCase(byte* start, byte* end /* (or length) */, ReadOnlySpan<byte> other)
    {
        // get the ptrs
        MakeUpperCase(start, end /* so we don't cause a heartbleed 2.0 */); // We expect "other" to be already upper case AND constant because we control it.
        return span.SequenceEquals(other);
    }
}
but I guess in this case where all the comparisons are in the same method, it is worth it to have just one MakeUpperCase and then do comparisons against the constants.

I will see if I can address this. For cluster commands, we already pay the cost of extracting a Span from the receive buffer, so it would be easy to have an uppercase bounded by the Span boundaries. I will have to benchmark the case where MakeUpperCase is used in ProcessMessages. Also I was thinking, that for internode communication commands, we don't need to call MakeUpperCase because they already formatted like this and external clients are not supposed to call them directly. So maybe I can separate the switch statement into two parts.

May 13 '24 16:05 vazois

I used MemoryExtensions.SequenceEqual because I read somewhere that it offers better performance ReadOnlySpan.SequenceEqual. I could benchmark it or if someone can chime in that will be great.

They're the same method. MemoryExtensions.SequenceEqual(this ReadOnlySpan<T> a, ReadOnlySpan<T> b) is the extension method that gets binded when you do span.SequenceEqual(other). There is LINQ Enumerable.SequenceEqual which won't get binded to spans, but you can accidentally bind it to anything that is IEnumerable 😅

Thanks for the clarification. Are you saying span.SequenceEnqual is more readable? I don't have a personal preference. I pushed a new commit where I follow the pattern used in ParseCommand (i.e. MemoryMarshal.Read)
In general, cluster commands are in the slow path, so I am wondering if we should use MakeUpperCase to avoid having two versions of each command type when parsing param byte sequence.

IMO yes. And in the cases where we want don't want to try help with a fast path (IIRC I think it was OperationDirection or something that had first fast-path check for already uppercase and then MakerUpperCase + SequenceEquals again) we could have a AsciiUtils.EqualsIgnoreCase helper which does this pattern for us. out of topic, sorta: MakeUpperCase is potentially dangerous. It has no length limit so it'll keep scanning potentially out of bounds and output the client anything that it got to. Current use of it is okay-ish, as long as nothing goes wrong (I haven't tried to craft malicious payload to it, so I think that method should have some limit..) 😅 edit: Found the place where we have "fast-path" for upper case (if we know to prioritize it for some reason) https://github.com/microsoft/garnet/blob/8856dc3990fb0863141cb902bbf64c13202d5f85/libs/server/Resp/Objects/ObjectStoreUtils.cs#L75-L89

We could have something like:
namespace Garnet.Common;

public static class AsciiUtils
{
    public static bool EqualsIgnoreCase(byte* start, byte* end /* (or length) */, ReadOnlySpan<byte> other)
    {
        // get the ptrs
        MakeUpperCase(start, end /* so we don't cause a heartbleed 2.0 */); // We expect "other" to be already upper case AND constant because we control it.
        return span.SequenceEquals(other);
    }
}
but I guess in this case where all the comparisons are in the same method, it is worth it to have just one MakeUpperCase and then do comparisons against the constants.
I will see if I can address this. For cluster commands, we already pay the cost of extracting a Span from the receive buffer, so it would be easy to have an uppercase bounded by the Span boundaries. I will have to benchmark the case where MakeUpperCase is used in ProcessMessages. Also I was thinking, that for internode communication commands, we don't need to call MakeUpperCase because they already formatted like this and external clients are not supposed to call them directly. So maybe I can separate the switch statement into two parts.

I removed MemoryExtensions and reverted back to span.SequenceEqual. Looks much better now 😄

May 13 '24 21:05 vazois

MakeUpperCase is potentially dangerous. It has no length limit so it'll keep scanning potentially out of bounds and output the client anything that it got to

MakeUpperCase does have a length limit of whatever the input buffer contains. It also stops conversion as soon as it sees a protocol character (such as \r and \n), so it should not make its way into user data even within the input buffer.

May 14 '24 00:05 badrishc

garnet garnet copied to clipboard

Cluster Command Parsing Refactor

garnet
garnet copied to clipboard