horaedb icon indicating copy to clipboard operation
horaedb copied to clipboard

Column order is not what user define when create table

Open jiacai2050 opened this issue 3 years ago • 7 comments

Describe this problem

Table' columns order is not what user define when create table

Steps to reproduce

Execute those SQL

CREATE TABLE test_order (
    host string tag,
    v2 int,
    ts timestamp NOT NULL,
    v1 int,
    timestamp KEY (ts)
)ENGINE = Analytic WITH (
    enable_ttl = 'false'
);

desc test_order;

desc will output

ts	timestamp	1	0	0
tsid	uint64	1	0	0
host	string	0	1	1
v1	int	0	1	1
v2	int	0	1	0

Expected behavior

The column order should be same with the order when created.

Additional Information

https://github.com/CeresDB/ceresdb/blob/3e826ffbb98d2c5351e63b28dd9dd4644e7d4c1e/sql/src/planner.rs#L265

The root cause is currently we use a BTreeMap when build create table plan, it should use a vector to keep the ordering

ref #154

jiacai2050 avatar Aug 11 '22 02:08 jiacai2050

After some thoughts, I find to fix this issue is a non-trivial task.

Currently CeresDB will put primary key columns at the front, and use a num_key_columns in Schema to indicate how many columns are in primary key, so if we keep the original column ordering, then we need more info to know which column belongs to primary key.

jiacai2050 avatar Aug 24 '22 08:08 jiacai2050

Can we reorder fields by subscripting the fields? like
https://github.com/CeresDB/ceresdb/blob/5c14f0055fb6816793b81bbc50c61b12b3079d5c/server/src/handlers/sql.rs#L201-L207

dust1 avatar Sep 15 '22 00:09 dust1

We can organize columns like what you said, but the more tricky issue is the order is used for primary key identification.

The first num_key_columns columns are primary key, if we keep the original order, then we need to keep some info to tell which columns are primary key

jiacai2050 avatar Sep 15 '22 06:09 jiacai2050

This looks like the BTree needs to be removed, will it infect the storage layer?

dust1 avatar Sep 15 '22 07:09 dust1

will it infect the storage layer?

I think it will.

The rows within one table will be ordered by primary key. For example, the key is composed of primary key + sequence in memtable

  • https://github.com/CeresDB/ceresdb/blob/a74eb6b5cbb89ef03cf7cf42a464d584bdd0b744/analytic_engine/src/memtable/skiplist/mod.rs#L84
  • https://github.com/CeresDB/ceresdb/blob/a74eb6b5cbb89ef03cf7cf42a464d584bdd0b744/analytic_engine/src/memtable/key.rs#L123

If we want to fix this, we can remove num_key_columns() method in schema, and replace all its usage with key_columns().

FYI, num_key_columns is only used in a few places, so this change may not be very large. image

jiacai2050 avatar Sep 15 '22 15:09 jiacai2050

maybe I can try to fix this, can it be assigned to me?

dust1 avatar Sep 18 '22 14:09 dust1

Sure, go ahead.

jiacai2050 avatar Sep 19 '22 02:09 jiacai2050