blazingmq Admin API Routing

Enhances the admin command API to automatically route certain commands to the proper node(s) for execution. For example, DOMAINS DOMAIN <name> PURGE should only be executed by the primary node, so if a user sends this command to a replica node, then the replica node will route it to the primary and report the result back to the user. In another case the user may want to reconfigure the cluster with DOMAINS RECONFIGURE <domain>. With this PR when the user sends this command to any node that node will then propagate the command to every other node in the cluster and collect the responses from each.

Testing performed This includes an integration test, test_admin_command_routing.py, that ensures a command sent to an improper node is routed and a response is sent back. This testing does not in any way verify the correctness of the command, merely that it has been routed.

Additional context Part of summer 2024 internship project.

Jul 09 '24 18:07 lukedigiovanna

What type of routing, if any, do we want for the following commands:

CACHE_CLEAR commands:

DOMAINS RESOLVER CACHE_CLEAR <domain>
CONFIGPROVIDER CACHE_CLEAR <domain>

Would it be useful to route this command to all nodes in the cluster?

SET/GET commands:

CLUSTERS CLUSTER <name> STORAGE REPLICATION SET <parameter> <value>
CLUSTERS CLUSTER <name> STORAGE REPLICATION GET <parameter>
CLUSTERS CLUSTER <name> STATE ELECTOR SET <parameter> <value>
CLUSTERS CLUSTER <name> STATE ELECTOR GET <parameter>

Is there a use case for wanting to set/get a parameter on all nodes?

A potential proposal is modifying these commands to optionally set/get a parameter for just one node or for all nodes.

Jul 09 '24 18:07 lukedigiovanna

How do we want to display the result of routing to multiple nodes, particularly in various encoding formats?

When a node routes to the entire cluster, we get several responses back. The current way I have implemented displaying these results as JSON is very similar to the existing way of printing a singular command result (using baljsn::Encoder).

For example, a response of DOMAINS RECONFIGURE <domain> looks like:

{
    "responses" : [
        {
            "source" : "east\/1",
            "response" : "{\n    \"success\" : {\n\n    }\n}\n"
        },
        {
            "source" : "east\/2",
            "response" : "{\n    \"success\" : {\n\n    }\n}\n"
        },
        {
            "source" : "west\/1",
            "response" : "{\n    \"success\" : {\n\n    }\n}\n"
        },
        {
            "source" : "west\/2",
            "response" : "{\n    \"success\" : {\n\n    }\n}\n"
        }
    ]
}

There is a current limitation that the response of a reroute is transmitted as a string, so the individual response is given as a string representation of the encoded JSON, as opposed to the JSON itself. This could be overcome by manually writing the code to display this list of responses as JSON, though this increases complexity and is not consistent with the existing method of encoding (namely using baljsn::Encoder)

Is this method sufficient for our purposes, or would we like another way of representing the result of a command routed to several nodes?

Jul 09 '24 18:07 lukedigiovanna

What type of routing, if any, do we want for the following commands:

CACHE_CLEAR commands:
DOMAINS RESOLVER CACHE_CLEAR <domain>
CONFIGPROVIDER CACHE_CLEAR <domain>
Would it be useful to route this command to all nodes in the cluster?

SET/GET commands:
CLUSTERS CLUSTER <name> STORAGE REPLICATION SET <parameter> <value>
CLUSTERS CLUSTER <name> STORAGE REPLICATION GET <parameter>
CLUSTERS CLUSTER <name> STATE ELECTOR SET <parameter> <value>
CLUSTERS CLUSTER <name> STATE ELECTOR GET <parameter>
Is there a use case for wanting to set/get a parameter on all nodes?

A potential proposal is modifying these commands to optionally set/get a parameter for just one node or for all nodes.

@jll63 Are you familiar with the CACHE_CLEAR <domain> set of commands?

Jul 09 '24 21:07 kaikulimu

What type of routing, if any, do we want for the following commands:

CACHE_CLEAR commands:
DOMAINS RESOLVER CACHE_CLEAR <domain>
CONFIGPROVIDER CACHE_CLEAR <domain>
Would it be useful to route this command to all nodes in the cluster?

SET/GET commands:
CLUSTERS CLUSTER <name> STORAGE REPLICATION SET <parameter> <value>
CLUSTERS CLUSTER <name> STORAGE REPLICATION GET <parameter>
CLUSTERS CLUSTER <name> STATE ELECTOR SET <parameter> <value>
CLUSTERS CLUSTER <name> STATE ELECTOR GET <parameter>
Is there a use case for wanting to set/get a parameter on all nodes?

A potential proposal is modifying these commands to optionally set/get a parameter for just one node or for all nodes.

For the SET/GET commands, the current proposal is to add a SET <parameter> <value> flavor for only sending to the current node, and SET_ALL <parameter> <value> flavor to send to all cluster nodes.

Jul 09 '24 21:07 kaikulimu

How do we want to display the result of routing to multiple nodes, particularly in various encoding formats?

When a node routes to the entire cluster, we get several responses back. The current way I have implemented displaying these results as JSON is very similar to the existing way of printing a singular command result (using baljsn::Encoder).

For example, a response of DOMAINS RECONFIGURE <domain> looks like:
{
    "responses" : [
        {
            "source" : "east\/1",
            "response" : "{\n    \"success\" : {\n\n    }\n}\n"
        },
        {
            "source" : "east\/2",
            "response" : "{\n    \"success\" : {\n\n    }\n}\n"
        },
        {
            "source" : "west\/1",
            "response" : "{\n    \"success\" : {\n\n    }\n}\n"
        },
        {
            "source" : "west\/2",
            "response" : "{\n    \"success\" : {\n\n    }\n}\n"
        }
    ]
}
There is a current limitation that the response of a reroute is transmitted as a string, so the individual response is given as a string representation of the encoded JSON, as opposed to the JSON itself. This could be overcome by manually writing the code to display this list of responses as JSON, though this increases complexity and is not consistent with the existing method of encoding (namely using baljsn::Encoder)

Is this method sufficient for our purposes, or would we like another way of representing the result of a command routed to several nodes?

@678098 Does this response formatting look sound to you?

Jul 09 '24 22:07 kaikulimu

How do we want to display the result of routing to multiple nodes, particularly in various encoding formats?

When a node routes to the entire cluster, we get several responses back. The current way I have implemented displaying these results as JSON is very similar to the existing way of printing a singular command result (using baljsn::Encoder).

For example, a response of DOMAINS RECONFIGURE <domain> looks like:
{
    "responses" : [
        {
            "source" : "east\/1",
            "response" : "{\n    \"success\" : {\n\n    }\n}\n"
        },
        {
            "source" : "east\/2",
            "response" : "{\n    \"success\" : {\n\n    }\n}\n"
        },
        {
            "source" : "west\/1",
            "response" : "{\n    \"success\" : {\n\n    }\n}\n"
        },
        {
            "source" : "west\/2",
            "response" : "{\n    \"success\" : {\n\n    }\n}\n"
        }
    ]
}
There is a current limitation that the response of a reroute is transmitted as a string, so the individual response is given as a string representation of the encoded JSON, as opposed to the JSON itself. This could be overcome by manually writing the code to display this list of responses as JSON, though this increases complexity and is not consistent with the existing method of encoding (namely using baljsn::Encoder)

Is this method sufficient for our purposes, or would we like another way of representing the result of a command routed to several nodes?

@lukedigiovanna @kaikulimu When designing admin commands, we should keep in mind several goals I think:

Keep the legacy text format unchanged.
Provide an easily parseable (json) format.
At the same time, keep the number of variations in the admin results output low.

The last point means, that we should not ask admin clients to be able to parse 3-5 different forms of the output for the same admin command. We might still need to parse 2 forms: error and success response.

As far as I understood, in the current PR we introduced another form of output, so there are at least 3 different forms:

Error (local exec)
Success (local exec)
Routing results (routed)

This introduces complexity on a client side, since client needs to parse another different case.

Another point, about easily parseable json. For responses, I would like to keep the requested encoding whenever it's possible.

Note that we are able to encode the admin command result in JSON already: https://github.com/bloomberg/blazingmq/blob/346b9c305e74f56841eee93c3562dfae91089b91/src/groups/mqb/mqba/mqba_application.cpp#L628 It is done automatically since baljsn supports automatic conversion of schema objects. So we are also able to decode it back the same way. It will be a few more lines, but the changes are local to the place where we aggregate routing results.

Do you pass admin prefix "encoding " when you route commands? Examples of such commands are here: https://github.com/bloomberg/blazingmq/blob/346b9c305e74f56841eee93c3562dfae91089b91/src/integration-tests/test_admin_client.py#L161

Jul 10 '24 12:07 678098

Keep the legacy text format unchanged.

I think with the proposal to add a SET_ALL/GET_ALL flavor for STORAGE REPLICATION and STATE_ELECTOR commands we maintain this since a client can still use SET/GET just as before -> its backwards compatible.

As far as I understood, in the current PR we introduced another form of output, so there are at least 3 different forms:

Error (local exec)

Success (local exec)

Routing results (routed)

Yes, this seems right. I'm not sure how to go about reporting the results in any other way. In the case of a command routed to just one other node (e.g. to the primary), the response is reported back to the client as is (no extraneous "responses" list or details about the node it was routed to). In the other case where a command is routed to >1 nodes, then a response like above is what gets outputted.

Where I think this approach gets inconvenient is in the possible [future] case of multiple primaries for a cluster. Then sometimes we would get a single response as is expected now (when 1 primary) or a list of responses (when multiple). I'm not sure what would be a good solution client-side, but I want to note that now.

Do you pass admin prefix "encoding " when you route commands?

Yes, when I route a command I just pass the exact command string that was sent to the original node, so the encoding is preserved.

Jul 10 '24 16:07 lukedigiovanna

@lukedigiovanna Don't forget to add license banners back to mqbcfg_messages.*

Aug 12 '24 18:08 kaikulimu

Congratulations @lukedigiovanna!

Aug 15 '24 12:08 678098