scylla-manager icon indicating copy to clipboard operation
scylla-manager copied to clipboard

Manager: Support small_table_optimization feature

Open mykaul opened this issue 1 year ago • 11 comments

See https://github.com/scylladb/scylladb/pull/15974

mykaul avatar Nov 26 '23 09:11 mykaul

A new API option "small_table_optimization" Manager should set this value to true for all system tables.

tzach avatar Nov 26 '23 10:11 tzach

Code is in - https://github.com/scylladb/scylladb/pull/15974

mykaul avatar Nov 28 '23 07:11 mykaul

@karol-kokoszka FYI.

With this feature.

- No token range to repair is needed by the user. It  will repair all token
ranges automatically.

- Users only need to send the repair rest api to one of the nodes in the
cluster. It can be any of the nodes in the cluster.

- It does not require the RF to be configured to replicate to all nodes in the
cluster. This means it can work with any tables as long as the amount of data
is low, e.g., less than 100MiB per node.

Item 3 will allow us to use small table optimization for more tables.

asias avatar Dec 05 '23 01:12 asias

@karol-kokoszka - this should be a '3.3' item I reckon, but it's an important one, hopefully we'll be able to get all this wrapped into 2024.1 (and perhaps backport eventually to 2023.1.x!)

mykaul avatar Dec 05 '23 09:12 mykaul

@asias How can SM know whether node supports small_table_optimization param? I also tested that Scylla doesn't complain when it gets unknown param in repair API call, so it creates the danger that SM sends API call with small_table_optimization thinking that it will repair the whole table, but it would be silently ignored and the table won't be repaired.

Michal-Leszczynski avatar Dec 05 '23 13:12 Michal-Leszczynski

@asias How can SM know whether node supports small_table_optimization param? I also tested that Scylla doesn't complain when it gets unknown param in repair API call, so it creates the danger that SM sends API call with small_table_optimization thinking that it will repair the whole table, but it would be silently ignored and the table won't be repaired.

This is a good observation. Scylla core should reject the unknown options.

I created an issue here:

https://github.com/scylladb/scylladb/issues/16299

and a PR here:

https://github.com/scylladb/scylladb/pull/16300

asias avatar Dec 06 '23 01:12 asias

@amnonh Is there any rest api to list the supported parameters for a given rest api?

E.g., in api/api-doc/storage_service.json. Can a user use the rest api to know that parameters.id and parameters.timeout are supported.

      {
         "path":"/storage_service/repair_status/",
         "operations":[
            {
               "method":"GET",
               "summary":"Query the repair status and return when the repair is finished or timeout",
               "type":"string",
               "enum":[
                  "RUNNING",
                  "SUCCESSFUL",
                  "FAILED"
               ],
               "nickname":"repair_await_completion",
               "produces":[
                  "application/json"
               ],
               "parameters":[
                  {
                     "name":"id",
                     "description":"The repair ID to check for status",
                     "required":true,
                     "allowMultiple":false,
                     "type": "long",
                     "paramType":"query"
                  },
                  {
                     "name":"timeout",
                     "description":"Seconds to wait before the query returns even if the repair is not finished. The value -1 or not providing this parameter means no timeout",
                     "required":false,
                     "allowMultiple":false,
                     "type": "long",
                     "paramType":"query"
                  }
               ]
            }
         ]
      },

asias avatar Dec 06 '23 01:12 asias

Yes, we are documenting the API using swagger, the old API uses swagger 1.2 which mean each part of the api is under different url. With a working scylla instance you can use the swagger ui that comes with scylla http://localhost:10000/ui/ Screenshot from 2023-12-06 11-53-17

And here is the repair status: Screenshot from 2023-12-06 11-55-04

The relative swagger can be downloaded: http://localhost:10000/api-doc/storage_service/

amnonh avatar Dec 06 '23 09:12 amnonh

@asias How can SM know whether node supports small_table_optimization param? I also tested that Scylla doesn't complain when it gets unknown param in repair API call, so it creates the danger that SM sends API call with small_table_optimization thinking that it will repair the whole table, but it would be silently ignored and the table won't be repaired.

For now - stick with simple rule - 2024.2 and above. (Later - we may backport this to 2023.1.x, unsure)

mykaul avatar Dec 07 '23 10:12 mykaul

@asias A follow-up question regarding this feature:

  • No token range to repair is needed by the user. It will repair all token ranges automatically.
  • Users only need to send the repair rest api to one of the nodes in the cluster. It can be any of the nodes in the cluster.
  • It does not require the RF to be configured to replicate to all nodes in the cluster. This means it can work with any tables as long as the amount of data is low, e.g., less than 100MiB per node.

When sending repair with small_table_optimization enabled, does Scylla respect hosts param?

            "name": "hosts",
            "in": "query",
            "required": false,
            "type": "string",
            "description": "Which hosts are to participate in this repair. Multiple hosts can be listed separated by commas."

SM uses this param to orchestrate repair on nodes from specified dc, ignore down nodes, etc. So does repairing a cluster with 1 node down with small_table_optimization and hosts (with excluded down node) goes well?

Michal-Leszczynski avatar Dec 15 '23 12:12 Michal-Leszczynski

@asias A follow-up question regarding this feature:

  • No token range to repair is needed by the user. It will repair all token ranges automatically.
  • Users only need to send the repair rest api to one of the nodes in the cluster. It can be any of the nodes in the cluster.
  • It does not require the RF to be configured to replicate to all nodes in the cluster. This means it can work with any tables as long as the amount of data is low, e.g., less than 100MiB per node.

When sending repair with small_table_optimization enabled, does Scylla respect hosts param?

            "name": "hosts",
            "in": "query",
            "required": false,
            "type": "string",
            "description": "Which hosts are to participate in this repair. Multiple hosts can be listed separated by commas."

SM uses this param to orchestrate repair on nodes from specified dc, ignore down nodes, etc. So does repairing a cluster with 1 node down with small_table_optimization and hosts (with excluded down node) goes well?

Hello Michal,

The small_table_optimization is designed to repair all ranges and all nodes in the cluster. We currently do not wire the hosts and dc selection with it. It does not make much sense if we use small_table_optimization while repairing only some of the DCs anyway. We can start with using small_table_optimization when none of the restrictions are specified by user which should be the most common cases. This feature is mainly for system table repairs pains like we have with system_auth.

asias avatar Dec 19 '23 00:12 asias