cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

sql: make the CockroachDB integer types more compatible with postgres

Open knz opened this issue 6 years ago • 44 comments

Type CockroachDB Postgres
int primitive type, 64-bit alias for int4, never shows up in vtables
int2 alias for int (thus wrong width), shows up as "int2" in certain vtables primitive type, 16-bit
int4 alias for int (thus wrong width), shows up as "int4" in certain vtables primitive type, 32-bit
int8 alias for int, shows up as "int8" in certain vtables primitive type, 64-bit
bigint alias for int, shows up as "bigint" in certain vtables alias for int8, never shows up in vtables
smallint alias for int (thus wrong width) shows up as "smallint" in certain vtables alias for int2, never shows up in vtables
serial special form for int (thus wrong width) with default unique_rowid() special form for int4, create sequence and default nextval(seqname)
bigserial alias for serial special form for int8, create sequence and default nextval(seqname)
smallserial alias for serial (thus wrong width) special form for int2, create sequence and default nextval(seqname)
bit near-alias for int, shows up as bit in certain vtables, max 64-bit primitive type, arbitrarily length byte array with bit input/output representations

Problems:

  • CockroachDB types int, int2, int4, serial, smallserial, bit have data width that are fully incompatible with postgresql
  • CockroachDB incorrectly preserves the names "bigint" and "smallint" whereas postgres desugars them to int8 and int2 during parsing.
  • PostgresDB serial types uses sequences, CockroachDB does not do this
  • bit has incorrectly a maximum size (and its underlying implementation is currently incorrect)

Fixing this would comprehensively adress #24062 #22607 #26730 #25098 #24686 and possibly others.

Informs #26128.

Jira issue: CRDB-4979

knz avatar Jun 22 '18 11:06 knz

@awoods187 I'd like to work to address this over the summer (next milestone or the one after that). This would solve a large class of compatibility problems.

knz avatar Jun 22 '18 11:06 knz

/cc @andy-kimball

benesch avatar Jul 03 '18 21:07 benesch

  • PostgresSQL serial types uses sequences, CockroachDB does not do this - planned for 2.1
  • CockroachDB incorrectly preserves the names "bigint" and "smallint" whereas postgres desugars them to int8 and int2 during parsing. planned for 2.1
  • bit has incorrectly a maximum size (and its underlying implementation is currently incorrect) planned in the 2.2 time frame, but may be considered as bugfix and backported in 2.1 (with doc known-limitation at the 2.1 release)
  • CockroachDB types int, int2, int4, serial, smallserial, bit have data width that are fully incompatible with postgresql -- actually that's 2 different issues
    • the visible data width, announced in the introspection tables (information_schema, pg_Catalog), is wrong - we can fix this in 2.1 by lying to clients and make them believe what they need to believe
    • the actual data width, which restricts the set of possible values for given data types, that becomes known limitation in 2.1 + planned for fix in 2.2

knz avatar Aug 07 '18 16:08 knz

Here's a micro-RFC that I'd like to get feedback on before starting on the implementation.

Updated 2018-12-05

Motivation:

We currently assume that INT really means INT8, however this is inconsistent with PostgreSQL and various (Java) ORMs that assume INT means INT4. We want a transition path such that INT means INT4 in CockroachDB.

Goals

  • Code that works correctly with INT --> INT8 on 2.1 will work correctly on 2.2 and also on 2.Next with only a flag change.
  • Existing database schemas will not change, but it's fine to tweak their introspection metadata to accurately reflect their configuration.
  • Handle the INT --> INTx mapping as close to parse time as possible to avoid needing to plumb type-mapping logic down into all of the various ColumnType support functions or schema introspection tables.
  • Present identical data and table-introspection information to all callers, regardless of wether or not they assume 4- or 8-byte INT values to minimize cognitive load and change-sprawl.

Plan

  • [2.2] PR #32831 makes INT an alias for INT8
  • [2.2] PR #32848 add as cluster and session setting default_int_size whose valid values are 8 (the default) and 4.
  • [2.2] Ensure that all parsing of type INT or INTEGER reflects the default_int_size setting by substituting INT4 or INT8 as appropriate.
    • Exception: When a column's default is set to exactly unique_rowid(), it should be an INT8.
  • ~[2.2] Add a cluster-version ratchet which will upgrade all existing TableDescriptors to ensure that any SemanticType=INT columns with unspecified Width are reset to Width=64.~
    • This won't be necessary since all user inputs where INT := INT4 will simply have INT4 in the parse tree. Other SQL not coming in from user input will continue to use INT := INT8 to preserve the historical behavior.
  • [2.2] Bugfix #16769: pg_attribute should return correct atttypid values, instead of always returning 20 (INT8) for all int types.
  • [2.2] Bugfix: information_schema.columns to remove character_maximum_length for int types, which pg leaves unpopulated. The docs on ColumnType.MaxCharacterLength() are inconsistent with the implementation.
  • ~[2.2] Test: The existing information_schema.column.crdb_sql_type column should return pre-2.2 values until the INT --> INT8 upgrade ratchet has fired.~
  • [2.2] Test: Update the information_schema and pg_class logic tests to match pgsql for e.g. create table x(a int, aa integer, b int4, c int8, d bigint, e int2, f smallint); when default_int_size=4 and document differences when in 8-byte mode.
  • [2.2] Test: Ensure that cast operations '1'::INT result in the correct data width.
  • [2.2] Telemetry: Record how often default_int_size is set.
  • [2.Next] Seeing an Int-type ColumnType with Width == 0 is an error (e.g. types.go: FromColumnType().
  • [2.Next] Switch default_int_size to 4.
  • [2.Next] Remove the unnecessary ColumnType.VisibleType values, since we can always back them out of the non-zero `ColumnType.Width.
  • [2.Future] Remove default_int_size settings.

Mixed-version notes

There shouldn't be any need to alter the ColumnType message. The only observable change for a 2.1 node is that 2.2 gateways would never create a ColumnType { SemanticType: INT, Width: 0 }; the width would always be set to some explicit value.

If a column is created on a 2.1 gateway in the mixed-mode case, the upgrade ratchet will update the TableDescriptor to act as though it had been created with INT --> INT8 semantics on a 2.2 gateway.

bobvawter avatar Nov 08 '18 21:11 bobvawter

LGTM

Exception: When a column's default is set to an INT8-returning function, e.g. unique_rowid(), it should be an INT8.

When this rule is invoked we should emit a warning.

bdarnell avatar Nov 09 '18 17:11 bdarnell

I recommend you file separate issues linked to this one for the various required bug fixes.

Exception: When a column's default is set to an INT8-returning function, e.g. unique_rowid(), it should be an INT8.

I would restrict that to the expression unique_rowid() specifically, not just any INT expression.

LGTM otherwise! Thanks for the analysis.

knz avatar Nov 12 '18 10:11 knz

@bdarnell

When this rule is invoked we should emit a warning.

We don't have infrastructure for pgwire warnings yet.

Personally for UX I would prefer introducing table/column comments (#19472) and put the warning in a comment instead.

knz avatar Nov 12 '18 10:11 knz

Hello, thought I would chime in with my experiences with this as a new cockroachdb user.

I was trying to experiment with cockroachdb instead of postgres with a webapp I'm making, and ran into the problem of INT not being equivalent to INT4, specifically in the postgres SERIAL type.

The app is written with Rust, and using the diesel ORM so the types are pretty strictly enforced.

This seems like a very common setup in postgres/sql so would make migration easier... Here's a search on github showing over 200k instances of this:

https://github.com/search?q=%22+SERIAL+PRIMARY+KEY%22&type=Code

I'm wondering if there will ever be a way to have a SERIAL column in cockroachdb that is an INT4? Seems weird to advertise postgres compatibility when this isn't supported.

edit: my post came off a bit too negative. Now that I think about this some more it isn't too bad. I would just have to migrate my db to use bigserial and change my app code from i32 to i64 in a couple places. The downsides of that are that the ids in my urls are a bit longer (not that big of deal), and that I 'waste' 4 extra bytes per entry. I have fewer than a million rows so that change is pretty insignificant.

edit2: ran into another problem now. My ORM assumes that the returned value is the width of the data type. So i have an INT4 stored and it gets returned to me from cockroach as an INT8... Erasing some type data I had, and taking up more space. I don't need 64 bits to represent a number between 1-100

nocduro avatar Nov 23 '18 01:11 nocduro

@nocduro we are working on it.

In the meantime, please know that we have introduced a partial compatibility mode in CockroachDB, so you can use INT4 with a sequence for SERIAL. See for yourself:

[email protected]:42763/defaultdb> set experimental_serial_normalization = sql_sequence;
SET

[email protected]:42763/defaultdb> create table t(x serial4);
CREATE TABLE


[email protected]:42763/defaultdb> show create t;
  table_name |                     create_statement
+------------+-----------------------------------------------------------+
  t          | CREATE TABLE t (
             |     x INT4 NOT NULL DEFAULT nextval('t_x_seq1':::STRING),
             |     FAMILY "primary" (x, rowid)
             | )
(1 row)

knz avatar Nov 23 '18 11:11 knz

The gist of this is that you need 1) to set experimental_serial_normalization and 2) use INT4, SERIAL4 etc explicitly (the word "INT" without a size will still automatically translate to INT8).

Regarding your final note:

edit2: ran into another problem now. My ORM assumes that the returned value is the width of the data type. So i have an INT4 stored and it gets returned to me from cockroach as an INT8...

Please provide more information. If this is a bug we need to fix it.

knz avatar Nov 23 '18 11:11 knz

Status update:

Right now, I’ve explored three different ways of getting this INT change in:

  1. Plumb EvalContext into everywhere that INT-as-a-type gets used:
  • https://github.com/cockroachdb/cockroach/compare/cfd53f62f038a06e34570f11ec02bc74dabd2aab...bobvawter:int4
  • I'm running into what appears to be type "erasure" in opt when converting from coltypes.T -> types.T -> coltypes.T
  • This manifests in tests like SELECT a::INT FROM (SELECT '4294967296' as a) when in INT4-mode. The optimizer unwinds this to a DInt and we lose the cast to what should be INT4 and a range-check error.
  1. Tweak the parser to never kick out an INT into our AST
  • https://github.com/cockroachdb/cockroach/compare/93632938e9ea7d86a56ec8423f2eff66c30965d5...bobvawter:int4-2
  • This approach adds a field to our Scanner type, which implements sqlLexer. The sql.y grammar is changed so that receiving an INT token delegate to the Scanner.defaultIntType field.
  • Parsing of SQL statements is performed in conn which adds the generated AST to a buffer to be drained by connExecutor. I was running into problems finding a path from conn to the SessionData owned by connExecutor.
  1. If we can't tweak the parser, then we ought to be able to modify the incoming AST in connExecutor
  • The tree.Statement and tree.Expr types are currently hit-or-miss in terms of being able to run a visitor over them.
  • It's a huge surface area to manually maintain the traversal code.
  • I'm sketching out an idea for what a code-generated visitor API could look like here: https://github.com/cockroachdb/cockroach/pull/32628/files

Based on a chat yesterday with @jordanlewis, I think option 2 is once again viable since we have identified a way to get to the SessionData from conn.

Regardless of implementation complexity, option 2 would be the obvious approach if we could make a knife-edge cutover: We want to make INT behave like FLOAT in which we assign an explicit size to the type whenever you use it.

Open question:

  • There are many places in the code where SQL fragments get re-parsed (e.g. CHECK statements, computed column expressions). Presumably, this new behavior should be applied only to statements coming in from the user. When it's time to do the cleanup migration, having a more robust visitor API could help make that much more fool-proof to implement.

bobvawter avatar Nov 27 '18 13:11 bobvawter

For the record my initial attempt went towards solution 2, then I got stuck in a different place than the one you identify. So my suspicion is that there will be multiple obstacles there.

For solution 1 you can look at https://github.com/bobvawter/cockroach/pull/1 - I think we can push forward there.

I do think we need to spend time on solution 3 no matter what. I'd just like that work direction to be decoupled from our roadmap goals so it does not become time-pressured.

knz avatar Nov 27 '18 16:11 knz

Meeting notes about solution 2.

  • there are 20+ callers of ParseOne and 10+ callers of Parse
  • a majority of these are used to re-parse SQL that's been stored inside the DB - we can handle as follows:
    • implement a migration that rewrites all the SQL that's already stored
    • make all these call points use a setting where a naked "INT" becomes invalid (ie assertion that it's not occurring)
    • this also requires that further insertions of SQL into the db (eg by nodes in a mixed-version config that are not yet upgraded) ultimately get rewritten too. We need a cluster version ratchet for this.
  • then there is a group of callers that are not internal:
    • SQL received from clients (in pgwire), this is what is mentioned in the comment above https://github.com/cockroachdb/cockroach/issues/26925#issuecomment-442054112
    • SQL constructed by logical planning and re-parsed at distsql processors
      • we need to audit the logical planning code to ensure it never synthesized a naked INT. (eg by deleting coltypes.Int altogether)
    • cockroach sqlfmt
    • SHOW SYNTAX
    • Backup/restore
    • IMPORT

For restore + import + show syntax we need to plumb the parameter for correctness.

So for solution 2, overall action item list is:

  • plumbing the configuration parameter through pgwire, backup/restore, import, show syntax, sqlfmt
  • verify no naked INT is synthesized
  • define an upgrade ratchet
  • implement a migration, but only run it on cluster version upgrade

For the plumbing into pgwire specifically,

  • jordan/bob found a way to channel the parameter from the session data into pgwire handleParse
  • open questions:
    • what happens with prepared statements across changes to the session var
    • does the answer to the last question influence our handling of placeholders (hints vs actual types for INT scalar expressions)
    • what happens with stmt/audit logging (do we want to log before/after transform)
    • [bob] a possible bypass of these questions is to force the setting to be initialized just once at the beginning of the session (eg via a status parameter)
      • question becomes is this workable in common clients frameworks

knz avatar Nov 27 '18 18:11 knz

About solution 1:

  • [knz] IIRC coltypes appear (or are synthesized) in the syntax only in 3 contexts
    • scalar operators, we have exactly N of them with N small. From the top of my head, CAST, ANNOTATE_TYPE and IS OF TYPE -- all these go through a single path of TypeCheck, where the naked INT could be eliminated (at that point)
    • CREATE and ALTER (ADD COLUMN / SET TYPE) -- all of which go through MakeColumnDefDescs
    • IMPORT (perhaps, but unlikely, RESTORE)
  • this is what guided my own opinion that solution 1 requires less work in total.
  • there is a little thing to be considered, which that TypeCheck is not meant to modify AST nodes in-place (it can only eliminate them), so in the special case of CAST which needs to keep the result of the desugaring (ANNOTATE_TYPE and IS OF TYPE are eliminated during type checking), we need to keep the result of the transform.
    • in the heuristic planner, the TypeCheck can modify in-place - that's automatically supported (although we haven't recommended that, but perhaps we can make an exception because heuristic code is meant to go away)
    • in the opt code the result must be translated to the scalar expression for CastExpr - see change in https://github.com/bobvawter/cockroach/pull/1/files

knz avatar Nov 27 '18 18:11 knz

@knz what's the status of this item? Can we get some sort of checkbox for what has already been addressed?

awoods187 avatar Mar 15 '19 23:03 awoods187

Hey @knz - quick question. Customer just asked about whether or not this issue would address https://github.com/jOOQ/jOOQ/issues/8478. It looks like JOOQ's problem has to do with strings and names and I don't see them mentioned here.

tim-o avatar Apr 03 '19 12:04 tim-o

@awoods187 I'll need to build an explanatory summary — with what works and what doesn't. Please come back to me to remind me, chances are I will lose track of this particular issue.

@tim-o the linked jOOQ issue is wholly unrelated. I have replied separately on the linked thread.

knz avatar Apr 03 '19 13:04 knz

So, upon request by @andy-kimball and @awoods187 I have analyzed the situation further.

Current situation / remain problems

  1. Generally, CockroachDB has stopped producing column descriptors with width=0 since 19.1 (thanks to Bob's and my changes).

    However, thinking about mixed-version compatibility. Because a 2.0/2.1 node could still produce Width=0, a 19.1 node has to be able to understand it. In 19.2, this is not necessary any more so we can probably drop this support. However this requires a descriptor upgrade function, so that any descriptor created/stored pre-19.2 becomes upgraded to width=64. This works remain to be done.

  2. As to the behavior in the future. There is a challenge we need to overcome, by providing a good solution. The problem to solve is a contradiction:

    • Generally, a postgres client expects that if they specify a type "X" in CREATE, it will show as "X" in introspection. More technically, it expects its "catalog type name" (the name stored in ~pg_catalog.pg_types.typname~ information_schema.columns.data_type) to come back out.

    • The name "integer" in the input syntax in Postgres maps internally to its UDT int4, and conveniently the catalog name of int4 is .... integer. This means clients that enter "integer" in CREATE also expect "integer" in introspection.

    • Meanwhile, the name "integer" in the input syntax in CockroachDB maps internally to the "udt" int8, and its catalog name is .... well ... we haven't decided that yet.

      In both v2.0 and v2.1, we kept it to "int"/"integer" by using width=0, so that the introspection would match the input. In v19.1, we have broken the introspection by forcing a width.

    (The contradiction is between the last two bullet points)

The way I see this is the following:

  • we should commit to a separation between catalog name and internal type/UDT, like in postgres. ~And then preserve the pg exception and say that int in the input syntax maps to some internal type (eg int8) and have that internal type reflect back as integer in introspection. I think this is a moderately easy step.~ (Edit: see my comment below)
  • we should further extend the compatibility knob default_int_size to also influence introspection somehow. Today, that flag changes the mapping from input syntax to internal type, but by doing so it breaks the input-introspection loop. We need to further change it so that the input-introspection loop is preserved for clients that are dead-set on 32-bit integers. When the flag is set, either the internal type int4 or int8 can map back to int in introspection (not just int8)

knz avatar Apr 04 '19 08:04 knz

@andy-kimball @awoods187 Note: the last column in the table above says "19.2" however I think it may be worth making this a bug fix and release it in a 19.1 patch release. The experience of clients will be rather uncomfortable until we fix that.

knz avatar Apr 04 '19 08:04 knz

Erratum: in my explanation above the "catalog type name" shows up in information_Schema, not in pg_catalog. The remainder of the argument remains the same.

knz avatar Apr 04 '19 09:04 knz

@andy-kimball for your education specifically: PostgreSQL has two separate introspection facilities for type names:

  • pg_catalog.pg_types.typname shows the "OID name" for the type, this is used throughout pg_catalog and is usually what's used by pg-specific applications. In that column, you'll see float8 when a column was created as FLOAT, or _int2 if a column was created as INT2[].

  • information_schema.columns.data_type shows the "information schema name" (what I called "catalog name" above) for the type. This follows the SQL standard names for types. In that column, you'll see double precision when a column was created as FLOAT, or ARRAY if a column was created as INT2[] (other columns are used to further detail the type)

Check this out.

Introspection for integer types in PostgreSQL

kena=> create table t(a int, b int4, c int8, d int2);
CREATE TABLE

kena=> select column_name,data_type from information_schema.columns where table_name='t';
 column_name | data_type
-------------+-----------
 a           | integer
 b           | integer
 c           | bigint
 d           | smallint
(4 rows)


kena=> select a.attname, t.typname from pg_attribute a, pg_class c, pg_type t where a.attrelid=c.oid and c.relname='t' and a.atttypid=t.oid and a.attname in ('a','b','c','d');
 attname | typname
---------+---------
 a       | int4
 b       | int4
 c       | int8
 d       | int2
(4 rows)

In CockroachDB 19.1

[email protected]:35327/defaultdb> create table t(a int, b int4, c int8, d int2);

CREATE TABLE

[email protected]:35327/defaultdb> select column_name,data_type from information_schema.columns where table_name='t';
  column_name | data_type
+-------------+-----------+
  a           | bigint
  b           | integer
  c           | bigint
  d           | smallint
  rowid       | integer
(5 rows)

[email protected]:35327/defaultdb> select a.attname, t.typname from pg_attribute a, pg_class c, pg_type t where a.attrelid=c.oid and c.relname='t' and a.atttypid=t.oid and a.attname in ('a','b','c','d');
  attname | typname
+---------+---------+
  a       | int8
  b       | int8
  c       | int8
  d       | int8
(4 rows)

(the pg_catalog introspection is utterly broken)

(Also, the rowid should be hidden in information_schema.columns. That's an unrelated bug. I hadn't noticed before.)

In CockroachDB 2.1

[email protected]:39805/defaultdb> create table t(a int, b int4, c int8, d int2);
CREATE TABLE

[email protected]:39805/defaultdb> select column_name,data_type from information_schema.columns where table_name='t';
  column_name | data_type
+-------------+-----------+
  a           | integer
  b           | integer
  c           | bigint
  d           | smallint
  rowid       | integer
(5 rows)

[email protected]:39805/defaultdb> select a.attname, t.typname from pg_attribute a, pg_class c, pg_type t where a.attrelid=c.oid and c.relname='t' and a.atttypid=t.oid and a.attname in ('a','b','c','d');
  attname | typname
+---------+---------+
  a       | int8
  b       | int8
  c       | int8
  d       | int8
(4 rows)

What we want instead

The desired output will be, with default_int_size = 8:

[email protected]:39805/defaultdb> select column_name,data_type from information_schema.columns where table_name='t';
  column_name | data_type
+-------------+-----------+
  a           | bigint
  b           | integer
  c           | bigint
  d           | smallint
  rowid       | bigint
(5 rows)

[email protected]:39805/defaultdb> select a.attname, t.typname from pg_attribute a, pg_class c, pg_type t where a.attrelid=c.oid and c.relname='t' and a.atttypid=t.oid and a.attname in ('a','b','c','d');
  attname | typname
+---------+---------+
  a       | int8
  b       | int4
  c       | int8
  d       | int2
(4 rows)

And when default_int_size = 4:

[email protected]:39805/defaultdb> select column_name,data_type from information_schema.columns where table_name='t';
  column_name | data_type
+-------------+-----------+
  a           | integer
  b           | integer
  c           | bigint
  d           | smallint
  rowid       | bigint
(5 rows)

[email protected]:39805/defaultdb> select a.attname, t.typname from pg_attribute a, pg_class c, pg_type t where a.attrelid=c.oid and c.relname='t' and a.atttypid=t.oid and a.attname in ('a','b','c','d');
  attname | typname
+---------+---------+
  a       | int4
  b       | int4
  c       | int8
  d       | int2
(4 rows)

knz avatar Apr 04 '19 09:04 knz

I am just noticing that CockroachDB 19.1 still produces Width=0 for the auto-generated rowid column. That's not good. It will prevent removing that logic in 19.2 due to cross-version compatibility, unless we fix it in a 19.1 patch release.

knz avatar Apr 04 '19 09:04 knz

@kena thanks for the detailed report as this is something i need to care about while maintaining support for CockroachDB driver in SQLBoiler ( https://github.com/glerchundi/sqlboiler-crdb).

Good job! 👏🏻

On Thu, 4 Apr 2019 at 11:16, kena [email protected] wrote:

I am just noticing that CockroachDB 19.1 still produces Width=0 for the auto-generated rowid column. That's not good. It will prevent removing that logic in 19.2 due to cross-version compatibility, unless we fix it in a 19.1 patch release.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cockroachdb/cockroach/issues/26925#issuecomment-479818466, or mute the thread https://github.com/notifications/unsubscribe-auth/ACIPlpYmKY6q2OOPZ_tzxRCZORJ88tmhks5vdcLtgaJpZM4UzrjC .

glerchundi avatar Apr 04 '19 11:04 glerchundi

@andy-kimball @jordanlewis @knz what's left to do here after unifying the type changes?

awoods187 avatar May 01 '19 04:05 awoods187

@awoods187 there are a couple of small catalog changes that need to be made. So far I don't believe them to be particularly important.

The main decision point is whether or not we want to change what integer means from int8 to int4 by default. I personally believe that we should not change this. I don't think it's a major sticking point, will be a headache, and people or tests who really need this (for some reason) can change the session variable.

jordanlewis avatar Sep 26 '19 17:09 jordanlewis

I think it is a major sticking point - java and javascript are both sensitive to integer sizes, and session variables are awkward to use in many client drivers. I'm not sure whether it clears the bar for a significantly backwards-incompatible change, but I think maybe yes.

Note that it wouldn't make sense to change integer to int4 without also changing serial to serial4 (and therefore defaulting experimental_serial_normalization to sql_sequence). That has a big performance impact, and I'm more scared of making this change than just changing the integer type.

bdarnell avatar Sep 26 '19 17:09 bdarnell

Yes, they're sensitive to integer sizes, but since we now correctly report the type OIDs for all integer types (we didn't in 19.1), drivers will not have problems when they see an int8 for an integer.

jordanlewis avatar Sep 26 '19 17:09 jordanlewis

The grand types refactor had at least one nice benefit!

andy-kimball avatar Sep 26 '19 17:09 andy-kimball

Could we leave the cluster default at int8 if it's an existing cluster, but default to int4 for new clusters? i.e. treat it as a cluster setting rather than a session setting

andy-kimball avatar Sep 26 '19 17:09 andy-kimball

Could we leave the cluster default at int8 if it's an existing cluster, but default to int4 for new clusters? i.e. treat it as a cluster setting rather than a session setting

It's actually both a cluster and a session setting (the cluster setting is used as the default for the session setting), so that's exactly what would happen if we made this change. Existing clusters would continue to use integer=int8 while new ones would use integer=int4. So we wouldn't break any existing clusters, but cause a divergence between older prod clusters and newer dev/test clusters.

bdarnell avatar Sep 26 '19 17:09 bdarnell