greptimedb icon indicating copy to clipboard operation
greptimedb copied to clipboard

Support altering column's data type

Open MichaelScofield opened this issue 11 months ago • 7 comments

What problem does the new feature solve?

Sometimes user just want to modify the column's data type.

What does the feature do?

Make column's data type changeable via "alter column" stmt. For example, in mysql:

ALTER TABLE t1 MODIFY b INT

(see https://dev.mysql.com/doc/refman/8.0/en/alter-table.html). Or in postgresql:

ALTER TABLE t1 ALTER COLUMN b TYPE varchar(80)

(see https://www.postgresql.org/docs/current/sql-altertable.html).

We might want to go with mysql's "alter column" syntax for the conciseness.

One critical consideration is how to deal with un-convertable data. For example, when you modify the column from string to int, how to deal with the un-convertable data of "a"? In mysql, this results in an error:

mysql> select * from foo;
+------+
| s    |
+------+
| a    |
| 1    |
+------+
2 rows in set (0.01 sec)

mysql> alter table foo modify s int;
ERROR 1366 (HY000): Incorrect integer value: 'a' for column 's' at row 1
mysql> show create table foo;
+-------+----------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                       |
+-------+----------------------------------------------------------------------------------------------------+
| foo   | CREATE TABLE `foo` (
  `s` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+----------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

In postgresql, it's error, too:

postgres=# select * from foo;
 s
---
 a
 1
(2 rows)

postgres=# ALTER TABLE foo ALTER COLUMN s TYPE int;
ERROR:  column "s" cannot be cast automatically to type integer
HINT:  You might need to specify "USING s::integer".
postgres=# ALTER TABLE foo ALTER COLUMN s TYPE int using s::integer;
ERROR:  invalid input syntax for type integer: "a"
postgres=# \d foo
               Table "public.foo"
 Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
 s      | text |           |          |

(Interestingly, postgresql has a "hint" to let user specify the data convention method. We might want to do the same.)

So it's clear that we'd better follow their steps and return some errors complaining some data are just not convertable to the target type.

Implementation challenges

The performance is clearly the most challenging part. If we actually change the data's type in the underlying storage, we might need to rewrite all the affected ssts, which can be a lot of work. If we change the reader (so that the data is converted on the fly), it brings extra query cost. Neither is good.

MichaelScofield avatar Mar 14 '24 08:03 MichaelScofield

Step for modify column

  • Parser supports Modify Column: https://github.com/sqlparser-rs/sqlparser-rs/pull/1216
  • add ModifyColumn on AlterTableOperation and alter_expr::Kind(greptime-proto)
  • mapping alter_expr::Kind::ModifyColumn to process the logic of modify column: https://github.com/GreptimeTeam/greptimedb/pull/3757
  • CompatReader supports Column Cast: https://github.com/GreptimeTeam/greptimedb/pull/3745
    • Tips: the data needs to be type converted before the next compaction, so there is a column type conversion performance problem when reading.

How it works?: https://github.com/GreptimeTeam/greptimedb/pull/3701#issuecomment-2056265559

There is still one issue that has not yet been resolved:

When modify column, it need to read all the data in this column of the table and check whether each piece of data can be converted e.g. String is type-convertible to Integer, but hello cannot be converted to Integer.

How do I implement it in src/store-api/src/region_request.rs ModifyColumn::validate(I think it's best handled here)?

KKould avatar Apr 18 '24 03:04 KKould

Validating whether all data are valid is too expensive. We have some other options if we can't cast the value

  • fill by NULL (if the column is nullable)
  • fill by a default value

Maybe we can specify which value to fill in the ALTER statement.

evenyag avatar Apr 18 '24 07:04 evenyag

Maybe we can specify which value to fill in the ALTER statement.

I love this idea and will implement it when PR: https://github.com/sqlparser-rs/sqlparser-rs/pull/1216 is merged if you think my steps are feasible

KKould avatar Apr 18 '24 13:04 KKould

We can build this feature in a bottom-up pattern. Let's modify the CompatReader first. Then we can implement altering column type in the mito2 engine and other engines. The parser support should be the last step. This ensures that we don't leak an unimplemented feature to users.

We can also set up a simple rule.

  • fill NULL if the column is nullable
  • fill default if the column is not null

PostgreSQL's syntax is enough so I don't think we must support MySQL's alter syntax.

evenyag avatar Apr 18 '24 14:04 evenyag

Got it, then I will implement it in the reverse order of steps

KKould avatar Apr 18 '24 15:04 KKould

Hi @KKould, We support flushing and compacting a table via SQL. Would you like to add a fuzz test for column type alteration after #3796?

WenyXu avatar Apr 24 '24 13:04 WenyXu

Hi @KKould, We support flushing and compacting a table via SQL. Would you like to add a fuzz test for column type alteration after #3796?

We might also add some tests in this file. For example: insert data, change column type, insert data, scan

https://github.com/GreptimeTeam/greptimedb/blob/main/src/mito2/src/engine/alter_test.rs

evenyag avatar Apr 24 '24 13:04 evenyag

Hi @KKould, We support flushing and compacting a table via SQL. Would you like to add a fuzz test for column type alteration after #3796?

yep, after https://github.com/GreptimeTeam/greptimedb/pull/3796 merged

KKould avatar Apr 29 '24 06:04 KKould