stonedb icon indicating copy to clipboard operation
stonedb copied to clipboard

[RFC] SQL supported on StoneDB v2.0

Open hustjieke opened this issue 1 year ago • 1 comments

SQL Supported on StoneDB V2.0

This is a supplement to https://github.com/stoneatom/stonedb/issues/436

  • Overview
  • SQL Syntaxes
    • Create a Tianmu Table
    • Load the Data from an InnoDB Table to a Tianmu Table
      • Exclude Table Columns
      • Define the Secondary Engine
      • Load Tables
    • Run Queries
    • Monitor System Status

Overview

In this issue, we give out the design of SQL statements supported by StoneDB V2.0. As mentioned in #436, StoneDB V2.0 will support more SQL statements compared to V1.0, including SQL statements used to create Tianmu tables, load data from InnoDB tables, and monitor StoneDB status. Developers can follow the guidance below to try these SQL statements.

SQL Syntaxes

This part describes the SQL syntaxes introduced in StoneDB V2.0.

Create a Tianmu Table

First, let’s see how to create a Tianmu table. The SQL syntaxes used in StoneDB V2.0 is different from that used in V1.0. In StoneDB V1.0, you simply need to specify the engine type to tianmu. Following provides the syntax:

--- SQL syntax in StoneDB V1.0
create table orders (
   col1 int, 
   col2 varchar(10)
) engine = tianmu; 

However, StoneDB V2.0 adopts a new mechanism called secondary engine. This makes the SQL syntaxes in creating tables and loading data in V2.0 different from those in V1.0.

Following provides the SQL syntaxes that you can use to create a Tianmu table in StoneDB V2.0:

  • Using a column comment to specify that the column in the table is in the Tianmu engine as well as the encoding type for the column:
CREATE TABLE orders (nameVARCHAR(100) COMMENT 'TIANMU_COLUMN=ENCODING=SORTED' );

Before using column comments to specify columns to store in Tianmu, enable Tianmu as the secondary engine so that the columns then defined as TIANMU_COLUMN will be loaded into the Tianmu engine.

  • Specifying Tianmu as the secondary engine:
CREATETABLE orders (id INT) SECONDARY_ENGINE= Tianmu;

In V1.0, to create a Tianmu table, we must set engine to tianmu to use Tianmu as the primary engine. However, in V2.0, we WILL NOT use the engine field.

In V2.0, we specify Tianmu as the secondary engine, instead of the primary engine. As shown in the example above, we set SECONDARY_ENGINE to Tianmu. The SECONDARY_ENGINE field is also used in MySQL 8.0.

  • Suppose you initiate an analytical query that will access data in a table that is stored in InnoDB. You definitely can use the InnoDB engine to process this query, but the execution time will be long. In this case, we recommend that you convert this table to a Tianmu table and then use the Tianmu engine to process this table. To convert an InnoDB table to a Tianmu table, execute an ALTER TABLE statement to change the storage engine for the table. Following provides the syntax:
ALTERTABLE orders SECONDARY_ENGINE= Tianmu;

If you do not want to convert the entire table, you can also use a column comment to convert needed columns in the table to Tianmu columns. Following provides the syntax:

ALTERTABLE orders MODIFY name VARCHAR(100) COMMENT 'TIANM_COLUMN=ENCODING=SORTED';
  • Compression As we discussed in the Architecture of StoneDB V2.0, the data is in memory and organized in a column-based style. The column-based data format is compression friendly. In Tianmu, system will choose the best compression algorithm to compress data. StoneDB V1.0 already supports more than 20 compression algorithms, including LZ4, b2, PPM, and Delta.
--Parameter that specifies whether to enable data compression:
Tianmu_Compression=[ON/OFF]

Data compression saves storage space, though it has a minor impact on performance query runtimes, the rate at which queries are offloaded to Tianmu during change propagation, and the recovery stage.

  • Encoding Encoding string columns helps accelerate processing of queries that access these columns. Tianmu supports two types of string column encoding:

Variable-length encoding (VARLEN)

Dictionary encoding (SORTED)

When tables are loaded into Tianmu, variable-length encoding is applied to CHAR, VARCHAR, and TEXT-type columns by default. To use dictionary encoding, you must define the TIANMU_COLUMN=ENCODING=SORTED keyword string in a column comment before loading the table. The keyword string must be uppercase. Otherwise, it will be ignored.

You can define the keyword string in a CREATE TABLE or ALTER TABLE statement. Following are examples:

CREATE TABLE orders (name VARCHAR(100) COMMENT 'TIANMU_COLUMN=ENCODING=SORTED');
ALTER TABLE orders MODIFY name VARCHAR(100) COMMENT 'TIANMU_COLUMN=ENCODING=SORTED';

If you want to run JOIN operations involving string columns or use string functions and operators, we recommend that you use variable-length encoding. Variable-length encoding supports more expressions, filters, functions, and operators than dictionary encoding. Otherwise, select the encoding type based on the number of distinct values in the string column relative to the cardinality of the table.

  • Data types The supported data types in Tianmu are listed here. Considering what kind of data will be used to run analytical queries to find the inner relationship between the data. Firstly, Tianmu will support ONLY three kinds of data type: numeric, date and time, and string.

   - BIGINT

   - BOOL

Load the Data from an InnoDB Table to a Tianmu Table

Exclude Table Columns

Before loading a table into Tianmu, exclude columns with unsupported data types. Otherwise, the operation will fail. Therefore, we need to add column attribute NOT SECONDARY in an ALTER TABLE or CREATE TABLE statement. Add a new options check named description_secondary_definition.

create_definition: {
  ...
  ...
  | check_constraint_definition
  | description_secondary_definition
}

description_secondary_definition: 
DESCRIPTION data_type NOT SECONDARY
alter_option: {
  ...
  ...
  | MODIFY [COLUMN] col_name column_definition
        [FIRST | AFTER col_name]
  | MODIFY description_secondary_definition
  ...
  ...
}

description_secondary_definition: 
DESCRIPTION data_type NOT SECONDARY

For example:

mysql> ALTER TABLE orders MODIFY description BLOB NOT SECONDARY;
mysql> CREATE TABLE orders (id INT, description BLOB NOT SECONDARY);

NOTE: If a query accesses a column defined with the NOT SECONDARY attribute, the query is executed on InnoDB by default.

Define the Secondary Engine

Add keyword SECONDARY_ENGINE to table_option. SECONDARY_ENGINE_SYM has already implemented in sql/lex.h and sql/sql_yacc.yy The table option table_option can be used in both ALTER TABLE and CREATE TABLE statements:

table_option: {
    AUTOEXTEND_SIZE [=] value
  | AUTO_INCREMENT [=] value
  ...
  ...
  | SECONDARY_ENGINE = engine_name
  | SECONDARY_ENGINE_ATTRIBUTE [=] 'string'
  ...
  ...
}

For example:

mysql> ALTER TABLE orders SECONDARY_ENGINE = Tianmu;
 
mysql> CREATE TABLE orders (id INT) SECONDARY_ENGINE = Tianmu;

Loade Tables

To load a table into Tianmu, specify the SECONDARY_LOAD option in an ALTER TABLE statement. The keyword SECONDARY_LOAD has been added in sql/sql_yacc.yy:

        | SECONDARY_LOAD_SYM
          {
            $$= NEW_PTN PT_alter_table_secondary_load;
          }
        | SECONDARY_UNLOAD_SYM

Classes and functions defined in sql/sql_alter.h:

/**
  Represents ALTER TABLE SECONDARY_LOAD/SECONDARY_UNLOAD statements.
*/
class Sql_cmd_secondary_load_unload final : public Sql_cmd_common_alter_table {
 public:
  // Inherit the constructors from the parent class.
  using Sql_cmd_common_alter_table::Sql_cmd_common_alter_table;

  bool execute(THD *thd) override;

 private:
  bool mysql_secondary_load_or_unload(THD *thd, TABLE_LIST *table_list);
};

For example:

mysql> ALTER TABLE orders SECONDARY_LOAD;

Run Queries

From the users’ perspective, the method to run queries in StoneDB, no matter V1.0 or V2.0, is the same as that in MySQL. As long as you know how to use MySQL, you master how to use StoneDB. However, the query processing procedure is slightly different between MySQL and StoneDB V2.0. MySQL performs the following steps when processing a query:

  1. Parse the query to generate an abstract syntax tree (AST).
  2. Pass the AST to the optimizer for logical and physical optimization.
  3. Create an execution plan based on the query plan.
  4. Handle the query based on the execution plan.

The difference lies between step 3 and step 4. After completing optimization, StoneDB V2.0 will generate an execution plan for each storage engine, compare the costs of the execution plans, and then route the query to the storage engine that offers the lower cost. For more information, see issue #xxx.

Additional Information

  • SELECT statements will be routed to Tianmu. However, if the optimizer identifies that InnoDB is faster in processing certain SELECT statements, the SELECT statements will not be routed to Tianmu.

  • If you want to use Tianmu to process a query on certain tables, you must define these tables as Tianmu tables. Otherwise, the query WILL NOT be routed to Tianmu. What’s more, the tables MUST be loaded into Tianmu.

  • Query statements that have functions or operators unsupported by Tianmu cannot be executed by using Tianmu.

  • The automatic transaction feature must be enabled. Otherwise, workloads will NOT be routed to Tianmu.

[Query examples will be listed here later.]

Monitor System Status

To monitor system status, certain system variables must be configured, which are used to monitor:

  • Whether Tianmu is enabled. The Tianmu engine can be enabled at two levels: global and session-level.
  • Status of Tianmu nodes
  • Resource usage information of Tianmu, such as memory usage
  • Status of load operations of Tianmu
  • Buffer status of Tianmu
  • ...

hustjieke avatar Aug 19 '22 15:08 hustjieke

ACK.

hustjieke avatar Aug 24 '22 07:08 hustjieke