dolt icon indicating copy to clipboard operation
dolt copied to clipboard

Support Generated Columns

Open VinaiRachakonda opened this issue 2 years ago • 3 comments

Consider the following table definition

CREATE TABLE `server_cost` (
  `cost_name` varchar(64) NOT NULL,
  `cost_value` float DEFAULT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `comment` varchar(1024) DEFAULT NULL,
  `default_value` float GENERATED ALWAYS AS ((case `cost_name` when _utf8mb3'disk_temptable_create_cost' then 20.0 when _utf8mb3'disk_temptable_row_cost' then 0.5 when _utf8mb3'key_compare_cost' then 0.05 when _utf8mb3'memory_temptable_create_cost' then 1.0 when _utf8mb3'memory_temptable_row_cost' then 0.1 when _utf8mb3'row_evaluate_cost' then 0.1 else NULL end)) VIRTUAL,
  PRIMARY KEY (`cost_name`)
);

We should support Generated Columns for Mysql comptability

VinaiRachakonda avatar Mar 25 '22 18:03 VinaiRachakonda

Parsing is now supported. Functionality is not yet implemented.

VinaiRachakonda avatar Apr 06 '22 17:04 VinaiRachakonda

With Dolt 0.40.11, I'm able to add a generated column using an ALTER TABLE statement. However, when querying the altered table, the newly generated column only contains null values. In contrast, if I do SELECT ... AS generated_col FROM ... I get the expected result, so I know that my generation expression is valid.

Is this behavior still yet to be implemented in Dolt? I haven't seen any mention of generated columns in the changelog for more recent releases

addisonklinke avatar Sep 23 '22 16:09 addisonklinke

Hey @addisonklinke, that's correct – Dolt is currently able to parse the generated column syntax, but it does not yet generate the data in the column. We always prioritize requests from business/paying customers as our top priority, so that's why this one hasn't bubbled to the top yet, but it's definitely a feature we want to implement.

fulghum avatar Sep 23 '22 17:09 fulghum

A potential customer just asked for this on August 1. 2023. Might be time to go after it. @zachmu is going to scope this a bit more.

timsehn avatar Aug 03 '23 20:08 timsehn

This looks to be 1-2 weeks of work. If you only need STORED columns we can prioritize that part and deliver it first. VIRTUAL columns are quite a lot more complicated, indexes on them even more so.

zachmu avatar Aug 03 '23 22:08 zachmu

Apologies for the delay, some details on how to implement STORED/VIRTUAL below if materializing VIRTUAL generated columns as NULL fields in value tuples. I suspect read/write round tripping will have fewer surprises with NULL placeholders. However, an alternate path that omits NULL values on disk exists, but with an unknown amount of extra logic to track the omissions. The implementation for STORED is basically the same for most of the query types, the write path just actually writes the value and the read path does not need to resolve defaults/project the generated columns.

Create table: Generated defaults expression strings need to be resolved and validated the same way as regular column defaults, and in all of the same locations. modifySchemaTarget and resolveSchemaDefaults do a lot of the lifting in sql/planbuilder. Metadata on sql.Schema will need to communicate generated columns with integrators.

Schema: Generated columns will need to be written in the Dolt schema. We will probably either need to add a new field to the schema protobuf to indicate that a column is generated (STORED/VIRTUAL), or have a dummy column whose column default expression string has metadata embedded that indicates that after parsing it will be a generated column (ex: /*+ GENERATED_STORED */ <expr>).

Show create table/ info_schema.columns: Generated columns need to be read/resolved the same way as column defaults. The two places to test are 1) MySQL compatibility for infoschema formatting, and 2) there are dolt functions that expect create table statements to round-trip.

Alters: Changing a column name needs to be reflected in generated column expression strings (check columns does this). All default resolution still applies. Round tripping rules for rewriting an index will be relevant (read+write path). Indexes using generated columns appear to be valid in MySQL, but we can disable and support later.

Write path: Special validation: the column must be unspecified on insert. Which means that INSERT must always specify column names excluding generated columns. The SQL row should maybe materialize a NULL as the column default for writing; round-tripping needs to work so post-validation needs to be OK with a non-NULL generated column. Write path includes INSERT, CREATE FROM, LOAD DATA.

Read path: Where we decide to insert generated values matters for round-tripping. Assuming NULLS are passed around as placeholders (for virtual), the read path will either need ResolvedTable and company to inline a projection for generated columns, or add a Projection node above ResolvedTable, or create a new TableNode row source that can wrap ResolvedTable and inject generated defaults (last option is maybe the safest). Again, round-tripping is important here. CREATE TABLE ... FROM and INSERT ... FROM, LOAD DATA ... need sources and destinations to play well together with generated columns. This will be slightly different for STORED/VIRTUAL.

Dolt merge: Variation of merging two branches with different generated defaults should be tested, even if requiring manual resolution in some/all cases. The main thing to avoid are bad merges that create schema panics/bricked databases.

A GMS memory implementation should be able to test most of this behavior. Actually writing the values to schema/disk, and merge conflicts are the two exceptions that are Dolt-specific. There are also Dolt functions that will depend on sql.Row round tripping into/out of the table that should be tested with the new generated defaults (merge, diff, import, export). An alternative path where we do not write NULLs to disk would probably need more Dolt-specific code to account for cases where round-tripping is not straightforward. I don't think we actually write data for NULL values, so I am not sure the indirection is a lot more performant.

max-hoffman avatar Aug 21 '23 21:08 max-hoffman

@addisonklinke The latest release has support for virtual columns. There are some sharp edges (not all merges work correctly), but you could start playing with it. I'll close this issue when the sharp edges are gone.

zachmu avatar Nov 03 '23 17:11 zachmu

Going to resolve this in favor of the more scoped bug here:

https://github.com/dolthub/dolt/issues/6945

zachmu avatar Nov 03 '23 17:11 zachmu