pglast
pglast copied to clipboard
invalid sql generated with char column type
If you create an AST with a column like:
ColumnDef(
'column_name',
inhcount=0,
is_from_type=False,
is_local=True,
is_not_null=False,
typeName=TypeName(
names=(String('char'),),
pct_type=False,
setof=False,
typemod=-1,
typmods=(A_Const(10),)
)
)
it will serialize to "char"(20)
which is invalid sql. It should just be char(20)
I think char
is special:
$ pgpp -tS "create table foo (bar char(20))"
[{'@': 'RawStmt',
'stmt': {'@': 'CreateStmt',
'if_not_exists': False,
'oncommit': {'#': 'OnCommitAction',
'name': 'ONCOMMIT_NOOP',
'value': 0},
'relation': {'@': 'RangeVar',
'inh': True,
'location': 13,
'relname': 'foo',
'relpersistence': 'p'},
'tableElts': ({'@': 'ColumnDef',
'colname': 'bar',
'generated': '\x00',
'identity': '\x00',
'inhcount': 0,
'is_from_type': False,
'is_local': True,
'is_not_null': False,
'location': 18,
'storage': '\x00',
'typeName': {'@': 'TypeName',
'location': 22,
'names': ({'@': 'String',
'val': 'pg_catalog'},
{'@': 'String',
'val': 'bpchar'}),
'pct_type': False,
'setof': False,
'typemod': -1,
'typmods': ({'@': 'A_Const',
'location': 27,
'val': {'@': 'Integer',
'val': 20}},)}},)},
'stmt_len': 0,
'stmt_location': 0}]
and indeed:
$ pgpp -S "create table foo (bar char(20))"
CREATE TABLE foo (
bar char(20)
)
ya it's special, I'm guessing it's an internal alias of some sort. Here's what happens via psql:
localhost user@db=# create temp table foo(bar char(20));
CREATE TABLE
localhost user@db=# \d foo
Table "pg_temp_2.foo"
Column | Type | Collation | Nullable | Default
--------+---------------+-----------+----------+---------
bar | character(20) | | |
ya check here: https://www.postgresql.org/docs/12/datatype.html. there's a list of aliases, and it appears there are at least two levels: char -> character -> bpchar
so i guess the question is when constructing statements from AST, do you have to use the low-level types (in which case it should probably assert for invalid types), or can you use the high level types to preserve information. Otherwise you'd have information loss because you could never preserve the fact that the column is char/character/etc.
We cannot raise an exception, because we do not have access to a catalog, so we cannot determine whether the type is valid or not, it could really be a custom type, for example introduced by an extension.
perhaps then it needs to natively recognize pg aliases?
Maybe, yes: that bpchar speciality could be generalized. What about starting to collect such aliases?
is this what you're looking for?
localhost user@db=# \dTS+
List of data types
Schema | Name | Internal name | Size | Elements | Owner | Access privileges | Description
------------+-------------------------------+-------------------------------+-------+----------+-----------------------+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------
pg_catalog | aclitem | aclitem | 12 | | postgres | | access control list
pg_catalog | "any" | any | 4 | | postgres | | pseudo-type representing any type
pg_catalog | anyarray | anyarray | var | | postgres | | pseudo-type representing a polymorphic array type
pg_catalog | anyelement | anyelement | 4 | | postgres | | pseudo-type representing a polymorphic base type
pg_catalog | anyenum | anyenum | 4 | | postgres | | pseudo-type representing a polymorphic base type that is an enum
pg_catalog | anynonarray | anynonarray | 4 | | postgres | | pseudo-type representing a polymorphic base type that is not an array
pg_catalog | anyrange | anyrange | var | | postgres | | pseudo-type representing a polymorphic base type that is a range
pg_catalog | bigint | int8 | 8 | | postgres | | ~18 digit integer, 8-byte storage
pg_catalog | bit | bit | var | | postgres | | fixed-length bit string
pg_catalog | bit varying | varbit | var | | postgres | | variable-length bit string
pg_catalog | boolean | bool | 1 | | postgres | | boolean, 'true'/'false'
pg_catalog | box | box | 32 | | postgres | | geometric box '(lower left,upper right)'
pg_catalog | bytea | bytea | var | | postgres | | variable-length string, binary values escaped
pg_catalog | "char" | char | 1 | | postgres | | single character
pg_catalog | character | bpchar | var | | postgres | | char(length), blank-padded string, fixed storage length
pg_catalog | character varying | varchar | var | | postgres | | varchar(length), non-blank-padded string, variable storage length
pg_catalog | cid | cid | 4 | | postgres | | command identifier type, sequence in transaction id
pg_catalog | cidr | cidr | var | | postgres | | network IP address/netmask, network address
pg_catalog | circle | circle | 24 | | postgres | | geometric circle '(center,radius)'
pg_catalog | cstring | cstring | var | | postgres | | C-style string
pg_catalog | date | date | 4 | | postgres | | date
pg_catalog | daterange | daterange | var | | postgres | | range of dates
pg_catalog | double precision | float8 | 8 | | postgres | | double-precision floating point number, 8-byte storage
pg_catalog | event_trigger | event_trigger | 4 | | postgres | | pseudo-type for the result of an event trigger function
pg_catalog | fdw_handler | fdw_handler | 4 | | postgres | | pseudo-type for the result of an FDW handler function
pg_catalog | gtsvector | gtsvector | var | | postgres | | GiST index internal text representation for text search
pg_catalog | index_am_handler | index_am_handler | 4 | | postgres | | pseudo-type for the result of an index AM handler function
pg_catalog | inet | inet | var | | postgres | | IP address/netmask, host address, netmask optional
pg_catalog | int2vector | int2vector | var | | postgres | | array of int2, used in system tables
pg_catalog | int4range | int4range | var | | postgres | | range of integers
pg_catalog | int8range | int8range | var | | postgres | | range of bigints
pg_catalog | integer | int4 | 4 | | postgres | | -2 billion to 2 billion integer, 4-byte storage
pg_catalog | internal | internal | 8 | | postgres | | pseudo-type representing an internal data structure
pg_catalog | interval | interval | 16 | | postgres | | @ <number> <units>, time interval
pg_catalog | json | json | var | | postgres | | JSON stored as text
pg_catalog | jsonb | jsonb | var | | postgres | | Binary JSON
pg_catalog | jsonpath | jsonpath | var | | postgres | | JSON path
pg_catalog | language_handler | language_handler | 4 | | postgres | | pseudo-type for the result of a language handler function
pg_catalog | line | line | 24 | | postgres | | geometric line
pg_catalog | lseg | lseg | 32 | | postgres | | geometric line segment '(pt1,pt2)'
pg_catalog | macaddr | macaddr | 6 | | postgres | | XX:XX:XX:XX:XX:XX, MAC address
pg_catalog | macaddr8 | macaddr8 | 8 | | postgres | | XX:XX:XX:XX:XX:XX:XX:XX, MAC address
pg_catalog | money | money | 8 | | postgres | | monetary amounts, $d,ddd.cc
pg_catalog | name | name | 64 | | postgres | | 63-byte type for storing system identifiers
pg_catalog | numeric | numeric | var | | postgres | | numeric(precision, decimal), arbitrary precision number
pg_catalog | numrange | numrange | var | | postgres | | range of numerics
pg_catalog | oid | oid | 4 | | postgres | | object identifier(oid), maximum 4 billion
pg_catalog | oidvector | oidvector | var | | postgres | | array of oids, used in system tables
pg_catalog | opaque | opaque | 4 | | postgres | | obsolete, deprecated pseudo-type
pg_catalog | path | path | var | | postgres | | geometric path '(pt1,...)'
pg_catalog | pg_ddl_command | pg_ddl_command | 8 | | postgres | | internal type for passing CollectedCommand
pg_catalog | pg_dependencies | pg_dependencies | var | | postgres | | multivariate dependencies
pg_catalog | pg_lsn | pg_lsn | 8 | | postgres | | PostgreSQL LSN datatype
pg_catalog | pg_mcv_list | pg_mcv_list | var | | postgres | | multivariate MCV list
pg_catalog | pg_ndistinct | pg_ndistinct | var | | postgres | | multivariate ndistinct coefficients
pg_catalog | pg_node_tree | pg_node_tree | var | | postgres | | string representing an internal node tree
pg_catalog | point | point | 16 | | postgres | | geometric point '(x, y)'
pg_catalog | polygon | polygon | var | | postgres | | geometric polygon '(pt1,...)'
pg_catalog | real | float4 | 4 | | postgres | | single-precision floating point number, 4-byte storage
pg_catalog | record | record | var | | postgres | | pseudo-type representing any composite type
pg_catalog | refcursor | refcursor | var | | postgres | | reference to cursor (portal name)
pg_catalog | regclass | regclass | 4 | | postgres | | registered class
pg_catalog | regconfig | regconfig | 4 | | postgres | | registered text search configuration
pg_catalog | regdictionary | regdictionary | 4 | | postgres | | registered text search dictionary
pg_catalog | regnamespace | regnamespace | 4 | | postgres | | registered namespace
pg_catalog | regoper | regoper | 4 | | postgres | | registered operator
pg_catalog | regoperator | regoperator | 4 | | postgres | | registered operator (with args)
pg_catalog | regproc | regproc | 4 | | postgres | | registered procedure
pg_catalog | regprocedure | regprocedure | 4 | | postgres | | registered procedure (with args)
pg_catalog | regrole | regrole | 4 | | postgres | | registered role
pg_catalog | regtype | regtype | 4 | | postgres | | registered type
pg_catalog | smallint | int2 | 2 | | postgres | | -32 thousand to 32 thousand, 2-byte storage
pg_catalog | table_am_handler | table_am_handler | 4 | | postgres | |
pg_catalog | text | text | var | | postgres | | variable-length string, no limit specified
pg_catalog | tid | tid | 6 | | postgres | | (block, offset), physical location of tuple
pg_catalog | timestamp without time zone | timestamp | 8 | | postgres | | date and time
pg_catalog | timestamp with time zone | timestamptz | 8 | | postgres | | date and time with time zone
pg_catalog | time without time zone | time | 8 | | postgres | | time of day
pg_catalog | time with time zone | timetz | 12 | | postgres | | time of day with time zone
pg_catalog | trigger | trigger | 4 | | postgres | | pseudo-type for the result of a trigger function
pg_catalog | tsm_handler | tsm_handler | 4 | | postgres | | pseudo-type for the result of a tablesample method function
pg_catalog | tsquery | tsquery | var | | postgres | | query representation for text search
pg_catalog | tsrange | tsrange | var | | postgres | | range of timestamps without time zone
pg_catalog | tstzrange | tstzrange | var | | postgres | | range of timestamps with time zone
pg_catalog | tsvector | tsvector | var | | postgres | | text representation for text search
pg_catalog | txid_snapshot | txid_snapshot | var | | postgres | | txid snapshot
pg_catalog | unknown | unknown | var | | postgres | | pseudo-type representing an undetermined type
pg_catalog | uuid | uuid | 16 | | postgres | | UUID datatype
pg_catalog | void | void | 4 | | postgres | | pseudo-type for the result of a function with no real result
pg_catalog | xid | xid | 4 | | postgres | | transaction id
pg_catalog | xml | xml | var | | postgres | | XML content
this is from PG12. What's interesting is the quotes around char in the table above. Notice the following behavior:
localhost user@db=# create temp table foo(a "char");
CREATE TABLE
localhost user@db=# drop table foo;
DROP TABLE
localhost user@db=# create temp table foo(a "char(20)");
ERROR: 42704: type "char(20)" does not exist
LINE 1: create temp table foo(a "char(20)");
^
LOCATION: typenameType, parse_type.c:275
localhost user@db=# create temp table foo(a "char" (20));
ERROR: 42601: type modifier is not allowed for type "char"
LINE 1: create temp table foo(a "char" (20));
^
localhost user@db=# create temp table foo(a char(20));
CREATE TABLE
so it seems quoted char is just for 1 char, but if you specify a length then it needs to be unquoted. So somewhat of a special case.
I see, and some heuristic is already in place:
$ pgpp -S "create table foo (bar char(20))"
CREATE TABLE foo (
bar char(20)
)
$ pgpp -S "create table foo (bar "char"(20))"
CREATE TABLE foo (
bar char(20)
)
Going back to the original case however, the first problem is that you didn't specify the fully qualified name of the type, so in any case the machinery won't work.
my point is IMO one should be able to construct an AST from aliases since you may not have a way to do that a priori. In my case I'm translating TSQL types to PSQL programmatically via AST. Otherwise I'd have to hard-code these mappings (which I have to for now), much like this library is already doing.
I'm not quite sure I understand what you're suggesting though, are you suggesting using pgcatalog.character
? That won't work either because that outputs: CREATE TEMPORARY TABLE IF NOT EXISTS foo (column_name pg_catalog.character(10)) ON COMMIT DROP
which is also invalid
I meant that your
typeName=TypeName(
names=(String('char'),),
pct_type=False,
setof=False,
typemod=-1,
typmods=(A_Const(10),)
)
should actually be
typeName=TypeName(
names=(String('pg_catalog'), String('char'),),
pct_type=False,
setof=False,
typemod=-1,
typmods=(A_Const(10),)
)
Does that work?
nope
>>> import pglast
>>> from pglast.stream import RawStream
>>> from pglast import ast as pg_ast
>>> from pglast import enums as pg_enums
>>>
>>> column_defs=(pg_ast.ColumnDef(
... 'column_name',
... inhcount=0,
... is_from_type=False,
... is_local=True,
... is_not_null=False,
... typeName=pg_ast.TypeName(
... names=(pg_ast.String('pg_catalog'), pg_ast.String('char'),),
... pct_type=False,
... setof=False,
... typemod=-1,
... typmods=(pg_ast.A_Const(10),)
... )
... ),)
>>>
>>> tbl_stmt = pg_ast.CreateStmt(
... if_not_exists=True, oncommit=pg_enums.OnCommitAction.ONCOMMIT_DROP,
... relation=pg_ast.RangeVar(inh=True, relname='foo', relpersistence='t'), tableElts=column_defs
... )
>>> RawStream()(tbl_stmt)
'CREATE TEMPORARY TABLE IF NOT EXISTS foo (column_name pg_catalog.char(10)) ON COMMIT DROP'
>>>
$ psql -h localhost -U user -p port db
psql (13.3, server 12.7 (Debian 12.7-1.pgdg100+1))
localhost iam_commodities_admin@commodities=# CREATE TEMPORARY TABLE IF NOT EXISTS foo (column_name pg_catalog.char(10)) ON COMMIT DROP;
ERROR: 42601: type modifier is not allowed for type "pg_catalog.char"
LINE 1: ...TE TEMPORARY TABLE IF NOT EXISTS foo (column_name pg_catalog...
^
LOCATION: typenameTypeMod, parse_type.c:366
I see. So, pglast could have a better knowledge about aliases, as you suggested. OTOH, names=(pg_ast.String('pg_catalog'), pg_ast.String('char'),)
works now.
sorry not following, per above it doesn't work now
Strange, this works for me:
import subprocess
import pglast
from pglast.stream import RawStream
from pglast import ast as pg_ast
from pglast import enums as pg_enums
column_defs=(pg_ast.ColumnDef(
'column_name',
inhcount=0,
is_from_type=False,
is_local=True,
is_not_null=False,
typeName=pg_ast.TypeName(
names=(pg_ast.String('pg_catalog'), pg_ast.String('bpchar'),),
pct_type=False,
setof=False,
typemod=-1,
typmods=(pg_ast.A_Const(10),)
)
),)
tbl_stmt = pg_ast.CreateStmt(
if_not_exists=True, oncommit=pg_enums.OnCommitAction.ONCOMMIT_DROP,
relation=pg_ast.RangeVar(inh=True, relname='foo', relpersistence='t'), tableElts=column_defs
)
sql = RawStream()(tbl_stmt)
subprocess.run(['psql', '-e', '-c', sql, 'foo'])
and emits
CREATE TEMPORARY TABLE IF NOT EXISTS foo (column_name char(10)) ON COMMIT DROP
CREATE TABLE
remember, I'm trying to do via alias, names=(pg_ast.String('pg_catalog'), pg_ast.String('char'),),
Yes, I remember :-)
I already said above that pglast could have a better knowledge about aliases. But there's little difference for you, right now, to manage such aliases before creating the pglast.ast
tree (ie, as you go along migrating the TSQL
structure), or having such aliases taught in some way to pglast TypeName
.
As time permits, I will try to figure out if there is an easy way to handle aliases, but as of now I cannot see an easy path to do exactly what you are seeking in a generic way. Of course, ideas are welcome!
If I were you, I'd probably build a mapping between TSQL
types and pglast.ast.TypeName
.
ya I have one :) let me know if you want to keep this issue open then or add to some kind of backlog of yours.
my idea to do this programmatically was to build a string with the type create table foo (column_name char(10))
and pass this to parse_sql, and see what pglast spits out for the type mapping. This obviously is bad as it allows for string escaping attacks and what not.
so it seems pglast does have this information, perhaps this could be solved by exposing a helper method to do this mapping (which seems like pglast already has).
Can you clarify what would you do with the result of that parse? I do not understand how that can be useful for your cause...
so in my case i'd get back that char
maps to (pg_ast.String('pg_catalog'), pg_ast.String('bpchar'),),
and use that as the mapping