sqlite_fdw
sqlite_fdw copied to clipboard
Many problems with formal `text` affinity values in foreign tables
Only non - STRICT
tables in legacy SQLite databases is affected.
- There is foreign table with
bytea
column. In SQLite values usually belongs to blob affinity. In SQLiteBLOB
column also there is some blob images of simple UTF-8 or ASCII text files, for example with such content asテスト
orHello world
. This values stores withtext
affinity and is normal for converting tobytea
. Why in this case there is error https://github.com/pgspider/sqlite_fdw/blob/9fd31f43b08ca64006b86d0de316222c213d0081/sqlite_query.c#L93 (look like "text value for blob affinity") ? - There is foreign table with
bigint
column. Sometimes in SQLite there is empty string values (formallytext
affinity) in this column which obviously meansNULL
. I think we can't formally analyse empty strings in not-text columns always astext
affinity. If PostgreSQL type of data in foreign column don't belongs totext
/varchar
family we can threat empty string asNULL
. Maybe there is a reason to add a new option for "threat empty string asNULL
for non text and non blob PostgreSQL columns"?
Thank you for reporting.
SQLite FDW expects/assumes that column type of foreign table and data affinity of SQLite are same, as you know.
It is not supported to map different types which requires implicit type cast (conversion) of data.
---Just a comment:--- To support it, we need to define the conversion specification for various patterns of combinations of types between PostgreSQL and SQLite.
Additionally, we need to consider and care the logic of pushing down of WHERE condition.
For example, SELECT * FROM tbl WHERE col IS NULL
is given to PostgreSQL.
SQLite FDW wants to push down col IS NULL
. If supporting "treat empty string as NULL", SQLite FDW needs to pushdown col = ''
also like SELECT * FROM tbl WHERE col IS NULL or col = ''
on SQLite.
Thanks for comments, @t-kataym ! I made a draft for SELECT
only https://github.com/mkgrgis/sqlite_fdw/commit/db7f918a44ecd1eb00d79e3eff851f6740b36061 Could you point me a line to add col IS NULL or col = ''
mechanism, because it's still hard for me to determine all needed lines for code extension. Maybe something near https://github.com/pgspider/sqlite_fdw/blob/9fd31f43b08ca64006b86d0de316222c213d0081/sqlite_query.c#L214 ?
Test case for text
affinity for bytea
column.
SQLite table
CREATE TABLE "BLOB" (
i INTEGER PRIMARY KEY AUTOINCREMENT,
b BLOB
);
PostgreSQL foreign table
CREATE FOREIGN TABLE public."BLOB" (
i int4 OPTIONS(key 'true') NULL,
b bytea NULL
)
SERVER sqlite_srv;
Data preparing
echo -n "test" | hexdump -C
echo -n "テスト" | hexdump -C
74 65 73 74 = test e3 83 86 e3 82 b9 e3 83 88 = テスト
SQLite INSERT
INSERT INTO "BLOB" ("b") VALUES (x'e38386e382b9e38388');
INSERT INTO "BLOB" ("b") VALUES (x'74657374');
INSERT INTO "BLOB" ("b") VALUES ('テスト');
INSERT INTO "BLOB" ("b") VALUES ('test');
SQLite, affinity check
SELECT i, b, typeof (b) FROM "BLOB";
The same data in DBeaver
During
SELECT
in PostgreSQL there is invalid input for type "bytea" = SQLite "blob", but affinity = "text" for value ='テスト'
.
Practice described above was typical for SQLite-bash interaction for text files without \0
suitable to insert without hexdump
program processing. I think we can threat this practice as common and safe for SQL2016 behaviour because there is no text
in SQLite encodings we can't threat as blob
.
In clean "BLOB" table I also try to test
DELETE FROM "BLOB";
INSERT INTO "BLOB" ("b") VALUES (5);
INSERT INTO "BLOB" ("b") VALUES (8935456748442424);
INSERT INTO "BLOB" ("b") VALUES (32.8);
After this ugly SQL we have successfully SELECT
in PostgreSQL!
It's SQL2016 unsafe while values with int
and real
affinity can successfully converted to PostgreSQL bytea
after something like SQLite to_string
. First it's not per bit representation of this values and second SQL2016 disallowed this transformation at all because bit represetation of numeric types is implementation detail. Why there is disallowed int
/real
transformation to bytea
but error message for relatively safe text
affinity?
For not ugly data for bytea
there is solution draft near https://github.com/mkgrgis/sqlite_fdw/blob/f109f0facf3b85da6b57abb0b7c5b64773997f27/sqlite_query.c#L109
Look like we need CAST
in conditions for bytea
columns of foreign tables.
Depends on affinity, data is from first example
SELECT i, b, typeof (b) FROM "BLOB" where b = 'テスト'; -- 1 row
SELECT i, b, typeof (b) FROM "BLOB" where b = x'e38386e382b9e38388'; -- 1 row
Depends on data only, data is from first example
SELECT i, b, typeof (b) FROM "BLOB" where cast (b as text) = 'テスト'; -- 2 rows
SELECT i, b, typeof (b) FROM "BLOB" where cast (b as blob) = x'e38386e382b9e38388'; -- 2 rows
Now for PostgreSQL with
select * from "blob" where b = 'テスト';
there is base query
デバッグ: sqlite_fdw: finalize SELECT
i,
b FROM main."blob" WHERE ((
b = X'e38386e382b9e38388'))
only 1 row for blob
affinity only.
To support it, we need to define the conversion specification for various patterns of combinations of types between PostgreSQL and SQLite.
@t-kataym , what about https://github.com/mkgrgis/sqlite_fdw/tree/readme-fix-and-additions#datatypes ?
what about https://github.com/mkgrgis/sqlite_fdw/tree/readme-fix-and-additions#datatypes ?
@mkgrgis Thank you for summarizing it. Please give us a time to confirm it.
Could you point me a line to add col IS NULL or col = '' mechanism, because it's still hard for me to determine all needed lines for code extension.
Please refer sqlite_deparse_null_test()
which creates IS NULL
expression for SQLite query.
But I think it is unnecessary to understand how to pushdown col IS NULL or col = ''
mechanism because of the following reason.
The correct behavior of FDW is not to pushdown WHERE conditions if "empty_as_non_text_null" is enabled, I think. For example, SQLite table has following record (col1 is an empty string, col2 is NULL).
col1 | col2 |
---|---|
”” | NULL |
If user execute SELECT col1 FROM tbl WHERE col1 = col2;
on PostgreSQL, SQLite FDW should execute SELECT col1 FROM tbl
for SQLite.
To generalize it, I think it is safe not to pushdown WHERE condition which has text column.
what about https://github.com/mkgrgis/sqlite_fdw/tree/readme-fix-and-additions#datatypes ?
@mkgrgis Thank you for summarizing it. Please give us a time to confirm it.
Yes, please. There is updated table now. I understand it will hard to decide about 17×5=85 datatype×affinity combinations with only 8 obvious cases.
Please refer
sqlite_deparse_null_test()
which createsIS NULL
expression for SQLite query.
Thanks, @t-kataym !
The correct behavior of FDW is not to pushdown WHERE conditions if "empty_as_non_text_null" is enabled, I think.
Yes, it's better for SQL:2016 behaviour on PostgreSQL side.
If user execute
SELECT col1 FROM tbl WHERE col1 = col2;
on PostgreSQL, SQLite FDW should executeSELECT col1 FROM tbl
for SQLite.
I understand your example, @t-kataym, thanks. Really there is a problem, but other.
I thinks it's a bad example. Neither for ''
nor for NULL
there will no true
condition predicate. Empty strings are equal ''
= ''
, but one NULL
isn't equal to other NULL
. If there will be implemented full SQL:2016 behaviour on PostgreSQL side with our FDW, col1 = col2
will always NULL
if one of arguments will be NULL
.
I thinks it's a bad example. Neither for '' nor for NULL there will no true condition predicate. Empty strings are equal '' = '', but one NULL isn't equal to other NULL. If there will be implemented full SQL:2016 behaviour on PostgreSQL side with our FDW, col1 = col2 will always NULL if one of arguments will be NULL.
@mkgrgis Thank you for correcting me. Just my example was not suitable.
Anyway, I believe that you understood what I wanted to say.
Anyway, I believe that you understood what I wanted to say.
Yes, @t-kataym . Let's study about SQL:2016 behaviour in context of prefetched SELECT col1 FROM tbl
for SELECT col1 FROM tbl WHERE col1 = col2;
.
First I have thought WHERE x IS NULL OR x=''
is hard and unsafe. Let's test around int
and uuid
as first examples.
UUID
Well formed UUID in SQLite can have text
or blob
(only if contains 16 bytes) affinity. All BLOBs with 16 bytes of data is correct UUID, but only some texts is correct (see https://sqlite.org/src/file/ext/misc/uuid.c). Also formally mixed affinity blob
/text
is possible for UUID in SQLite. For UUID in SQLite pushdowned IS NULL
is totally unsafe, because for SQLite IS NULL
is only NULL
s, but for PostgreSQL IS NULL
includes all data unusable as well formed uuid
value. Also WHERE uuid = {some const}
may pushdowned in SQLite as search for both blob
and text
forms for simple, but not for STRICT
tables.
Conclusion: anyway no pushdowning needed for uuid
.
int
With PostgreSQL int
we have something other. In PostgreSQL I have SQL CREATE TABLE Test_int" ( i serial4 NOT NULL);
. Let's CREATE TABLE Test_int" ( i int NOT NULL);
in SQLite.
SELECT i FROM "Test_int" where i = '';
is correct for SQLite. For PostgreSQL it's incorrect query with error, hence FDW don't need support this query!
What about joining and JOIN
pushdowning for int
SQLite column with some empty strings? For PostgreSQL it's impossible to join any data for int
(int2
, int8
etc.) column by value equal to empty string ''
. In SQLite it's possible, but it's not SQL:2016 behaviour, we can only note this in documentation (README.md)
For SELECT a.*, b.* FROM a INNER JOIN b ON a.ia = b.ib
in PostgreSQL ia = ''
or ib = ''
is impossible, but for SQLite we can have corresponding rows with NULL
values after FDW. I think it's very rare case hence pusdowning is SQL:2016 safe with note in README.md and this will be usefully for data access speed.
Conclusion: we can pushdown conditions for int
family columns with implementation IS NULL
as IS NULL OR ''
if option discussed above is true
.
About other datatypes let's wait on your opinion and opinion of your department employees, @t-kataym . Its very important to decide about all combinations, because the data transformations is de facto kernel of sqlite_fdw
and users expect from us SQL:2016 (with little exceptions) behaviour of data in foreign tables.
To support it, we need to define the conversion specification for various patterns of combinations of types between PostgreSQL and SQLite.
@t-kataym , what about https://github.com/mkgrgis/sqlite_fdw/tree/readme-fix-and-additions#datatypes ?
Hello @mkgrgis,
Sorry for late response, I've checked the table and there are some comments:
About description:
'x - error' this description is meaningless it seems a result of a manual test result. Some type transformations do not always error such as SQLITE BLOB to Timestamp as I describe on note-2 below. I think it should be 'not support' for case always error and V for another case.
1. About SQLite INT & REAL transparent:
- SQLite INT/REAL => bytea: It should be 'V', the cast behavior: SQLite INT/REAL => Cast by sqlite API => C bytes => Postgres bytea
- SQLite INT/REAL => uuid: It should be 'V', the cast behavior: SQLite INT/REAL => Cast by sqlite API => C string => Postgres input function => Postgres uuid
2. About SQLite BLOB transparent:
It does not correct at conversion to date, json, name, text, timestamp, timestamptz, varchar. It should be 'V' instead of 'x':
contrib_regression=# select * from c;
c1
----------------------
\xe38386e382b9e38388
(1 row)
contrib_regression=# alter foreign table c alter column c1 type text;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
c1
--------
テスト
(1 row)
contrib_regression=# alter foreign table c alter column c1 type json;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
ERROR: invalid input syntax for type json
DETAIL: Token "テスト" is invalid.
CONTEXT: JSON data, line 1: テスト
contrib_regression=# alter foreign table c alter column c1 type date;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
ERROR: invalid input syntax for type date: "テスト"
As you can see, the error just because the text presentation is not match with target type. If the blob column has expected data for text type, the conversion is OK:
sqlite> .schema c
CREATE TABLE c (c1 blob);
sqlite> DELETE FROM c;
sqlite> INSERT INTO c VALUES ('2023-01-01');
contrib_regression=# alter foreign table c alter column c1 type date;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
c1
------------
2023-01-01
3. About SQLite TEXT:
In my understanding sqlite_fdw does not support cast (raise ERROR) only when execute cast from SQLITE TEXT to Postgres int2, int4, int8, bool, float4, float8, numeric and bytea. And sqlite API may return TEXT type for BLOB column.
However, the description of these types is confused, '-' for float4, float8; 'x' for int2, int4; 'V+' for numeric...
About: SQLite TEXT -> date, json, numeric, time,... (V+). The cast behavior is the same: SQLite TEXT -> C string -> Postgres input function -> Postgres data
The behavior is same as SQLite TEXT -> text(✔), varchar(✔), name (V) but the description is not same.
Hello, @nxhai98!
What about we are discussing? There are 3 aspects of transformation table
- as is (current implementation)
- ISO SQL:2016 (recommendations)
- future behaviour
and also there are 2 C language data transformations
-
SELECT
data (SQLite for PostgreSQL) -
WHERE
data (PostgreSQL for SQLite)
My table from previous URL is about future behaviour and mainly (but not only) about SELECT
. I think You are also discussing about this.
About description:
'x - error' this description is meaningless it seems a result of a manual test result.
No, this means only C-language code of sqlite_fdw
will throw here an error of improper data like https://github.com/pgspider/sqlite_fdw/blob/e865a4ddb1a79bb90cec41b2a1f7a33c305f2696/sqlite_query.c#L94
Some type transformations do not always error such as SQLITE BLOB to Timestamp as I describe on note-2 below. I think it should be 'not support' for case always error and V for another case.
Yes, you are right. No support
is ISO SQL sentence. I can use "empty set" sign ∅
for this case, it's more clear.
1. About SQLite INT & REAL transparent:
* SQLite INT/REAL => bytea: It should be 'V', the cast behavior: SQLite INT/REAL => Cast by sqlite API => C bytes => Postgres bytea
I have described this case here. It's possible to show SQLite int
as bytea
, but this presentation isn't recommended by ISO SQL as detail of implementation. Yes, Your conception isn't look like current behaviour, but can be usefully for learning SQLite internals for students. I approve Your algorithm.
* SQLite INT/REAL => uuid: It should be 'V', the cast behavior: SQLite INT/REAL => Cast by sqlite API => C string => Postgres input function => Postgres uuid
SQLite int
and real
is not longer than 8 byte, uuid
is only 16byte value. Hence all possible input will be improper for uuid
.
2. About SQLite BLOB transparent:
It does not correct at conversion to date, json, name, text, timestamp, timestamptz, varchar. It should be 'V' instead of 'x':
contrib_regression=# select * from c; c1 ---------------------- \xe38386e382b9e38388 (1 row) contrib_regression=# alter foreign table c alter column c1 type text; ALTER FOREIGN TABLE contrib_regression=# select * from c; c1 -------- テスト (1 row)
Please note here was interpretation of BLOB
affinity data as text data with PostgreSQL encoding of current database! We must add this important non ISO SQL aspect in documentation. Also WHERE
(PostgreSQL for SQLite) data search will not easy in this case. Under our virtual PostgreSQL data can be physical text
or blob
in SQLite.
contrib_regression=# alter foreign table c alter column c1 type json; ALTER FOREIGN TABLE contrib_regression=# select * from c; ERROR: invalid input syntax for type json DETAIL: Token "テスト" is invalid. CONTEXT: JSON data, line 1: テスト contrib_regression=# alter foreign table c alter column c1 type date; ALTER FOREIGN TABLE contrib_regression=# select * from c; ERROR: invalid input syntax for type date: "テスト"
As you can see, the error just because the text presentation is not match with target type. If the blob column has expected data for text type, the conversion is OK:
sqlite> .schema c CREATE TABLE c (c1 blob); sqlite> DELETE FROM c; sqlite> INSERT INTO c VALUES ('2023-01-01'); contrib_regression=# alter foreign table c alter column c1 type date; ALTER FOREIGN TABLE contrib_regression=# select * from c; c1 ------------ 2023-01-01
Yes, with cast to encoding of current PostgreSQL database all this cases сan be marked as V (transparent transformation), but I added new T value "cast to text in encoding of current PostgreSQL database and than transparent transformation if applicable". Max length of BLOB
data in SQLite is very big and also applicable to text
, hence sometimes we can check BLOB
or text
length, especially for types from datatime
family, uuid
, name
, 'bool', 'int', maybe also for varchar
with a determined length. Now all values are fetched even if obviously malformed because are very long.
3. About SQLite TEXT:
In my understanding sqlite_fdw does not support cast (raise ERROR) only when execute cast from SQLITE TEXT to Postgres int2, int4, int8, bool, float4, float8, numeric and bytea.
Yes, it's normal for ISO SQL:2016 and in my opinion.
And sqlite API may return TEXT type for BLOB column.
As SQLite C language text-like output SQLite BLOB data really can be more flexible for next transformations. But for PostgreSQL C function this input will not obvious. Yes, text
and blob
affinity usually have equal size limits, but there are many blob
s with no available text presentation in current encoding of PostreSQL database. As example UTF-8 have improper byte sequences cast result can be malformed.
However, the description of these types is confused, '-' for float4, float8; 'x' for int2, int4; 'V+' for numeric...
Marked as V+, but most of well-formed values will store as values with real
affinity. Some hex
or octal integer numbers can have text
affinity.
About: SQLite TEXT -> date, json, numeric, time,... (V+). The cast behavior is the same: SQLite TEXT -> C string -> Postgres input function -> Postgres data
The behavior is same as SQLite TEXT -> text(✔), varchar(✔), name (V) but the description is not same.
In ISO SQL varchar
without length and text
is the nearest equivalent of text
affinity, but name
always have length and formally can be truncated. I meant only this difference.
About SQLite BLOB transparent
As you can see, the error just because the text presentation is not match with target type.
Not only presentation, @nxhai98, but also encoding! Please note, Unicode databases like UTF-8/UTF-16/UTF-32 isn't only possible. Yes, SQLite recommends threat all text as Unicode data, but we have no such recommendations about BLOBs .
Also WHERE
(PostgreSQL for SQLite) data search will not easy in this case. Under our virtual PostgreSQL data can be physical text
or blob
in SQLite. Can we conceptually write flexible WHERE
PostgreSQL data to SQLite query transformation with searching applicable data in both BLOB
and text
forms?
Thank for the feedback @mkgrgis,
My table from previous URL is about future behaviour and mainly (but not only) about SELECT. I think You are also discussing about this.
Yes, correct.
I have described this case https://github.com/pgspider/sqlite_fdw/issues/66#issuecomment-1455789778. It's possible to show SQLite int as bytea, but this presentation isn't recommended by ISO SQL as detail of implementation. Yes, Your conception isn't look like current behaviour, but can be usefully for learning SQLite internals for students. I approve Your algorithm.
Thank for explaining. However, SQLite INT/REAL => bytea still 'x' in the datatype table. Could you update this.
SQLite int and real is not longer than 8 byte, uuid is only 16byte value. Hence all possible input will be improper for uuid
Thank for explaining, I understood.
Please note here was interpretation of BLOB affinity data as text data with PostgreSQL encoding of current database! We must add this important non ISO SQL aspect in documentation. Also WHERE (PostgreSQL for SQLite) data search will not easy in this case. Under our virtual PostgreSQL data can be physical text or blob in SQLite.
Currently, sqlite_fdw support only utf-8 SQLite database and don't consider about current Postgres encoding.
Yes, with cast to encoding of current PostgreSQL database all this cases сan be marked as V (transparent transformation), but I added new T value "cast to text in encoding of current PostgreSQL database and than transparent transformation if applicable". Max length of BLOB data in SQLite is very big and also applicable to text, hence sometimes we can check BLOB or text length, especially for types from datatime family, uuid, name, 'bool', 'int', maybe also for varchar with a determined length. Now all values are fetched even if obviously malformed because are very long.
Currently, sqlite_fdw always uses sqlite3 APIs to cast value to text/int/float and always 'utf-8' for text. So, the T value cast to text in encoding of current PostgreSQL database and than transparent transformation if applicable
is not correct.
How about: cast to text in SQLite utf-8 encoding and then transparent transformation if applicable
?
As SQLite C language text-like output SQLite BLOB data really can be more flexible for next transformations. But for PostgreSQL C function this input will not obvious. Yes, text and blob affinity usually have equal size limits, but there are many blobs with no available text presentation in current encoding of PostreSQL database. As example UTF-8 have improper byte sequences cast result can be malformed.
sqlite_fdw only get BLOB data directly when postgres column type is bytea, with other data type, BLOB is cast to int/float/text by sqlite3 API first and only UTF-8 supported. The cast: Sqlite BLOB -> sqlite3_column_text (utf-8: unsigned char *) -> Postgres input function.
Not only presentation, @nxhai98, but also encoding! Please note, Unicode databases like UTF-8/UTF-16/UTF-32 isn't only possible. Yes, SQLite recommends threat all text as Unicode data, but we have no such recommendations about BLOBs . Also WHERE (PostgreSQL for SQLite) data search will not easy in this case. Under our virtual PostgreSQL data can be physical text or blob in SQLite. Can we conceptually write flexible WHERE PostgreSQL data to SQLite query transformation with searching applicable data in both BLOB and text forms?
Do you mean:
postgres: select * from t2 where bytea_col = 'test';
=> sqlite execute: select * from t2 where bytea_col = 'test';
postgres: select * from t2 where bytea_col = '\x74657374';
=> sqlite execute: select * from t2 where bytea_col = '\x74657374';
If so, his may not feasible on FDW because there is implicit cast in OP: bytea_col = 'test' this corresponding with bytea_col = 'test'::bytea and the implicit cast is executed before we get this value in FDW (when build sqlite query):
explain verbose select * from tbl where bytea_col = 'test';
QUERY PLAN
------------------------------------------------------------
Seq Scan on public.tbl (cost=0.00..21.00 rows=4 width=64)
Output: bytea_col, text_col
Filter: (tbl.bytea_col = '\x74657374'::bytea)
(3 rows)
I check again the table:
- description of SQLite TEXT => float4/float8 (-), int2/int4/int8 (x) to all V+. With current behavior: sqlite_fdw does not support these cast => it should be (∅) or (-)
- Also with SQLite TEXT => numeric (V+) and bytea (-), sqlite_fdw does not support these cast => it should be (∅) or (-)
- The cast SQLite BLOB => float4/float8/int2/int4/int8 (b - show per-bit form) the current behavior (cast behavior of SQLite API) is: SQLite BLOB -> SQLite TEXT -> C number: https://www.sqlite.org/lang_expr.html#castexpr
When casting a BLOB value to INTEGER, the value is first converted to TEXT.
=> this mark b - show per-bit form is not correct -> it should be same as another cast of SQLite BLOB (T).
- The cast SQLite BLOB -> numeric (∅) => it should be (V+): the cast behavior: SQLite BLOB -> SQLite TEXT (sqlite3_column_text()) -> input function of postgres
Thank for explaining. However, SQLite INT/REAL => bytea still 'x' in the datatype table. Could you update this.
Yes, fixed, @nxhai98!
Currently, sqlite_fdw support only utf-8 SQLite database and don't consider about current Postgres encoding.
It's important don't consider about current Postgres encoding! UTF-8 output of SQLite is not problem, also UTF-16 SQLite databases is not problem also. What if there will be some EUC_JP
, EUC_CN
or WIN866
PostgreSQL? Maybe we should mark all text output of SQLite as UTF-8 input for PostgreSQL?
Currently, sqlite_fdw always uses sqlite3 APIs to cast value to text/int/float and always 'utf-8' for text. So, the T value
cast to text in encoding of current PostgreSQL database and than transparent transformation if applicable
is not correct. How about:cast to text in SQLite utf-8 encoding and then transparent transformation if applicable
?
More clear is
... cast to text in SQLite utf-8 encoding, then to PostgreSQL text with current encoding of database* and then transparent transformation if applicable ...
sqlite_fdw only get BLOB data directly when postgres column type is bytea, with other data type, BLOB is cast to int/float/text by sqlite3 API first and only UTF-8 supported. The cast: Sqlite BLOB -> sqlite3_column_text (utf-8: unsigned char *) -> Postgres input function.
Thanks. So, SQLite blob as text will be utf-8. It's good
... Can we conceptually write flexible WHERE PostgreSQL data to SQLite query transformation with searching applicable data in both BLOB and text forms?
Do you mean:
postgres: select * from t2 where bytea_col = 'test'; => sqlite execute: select * from t2 where bytea_col = 'test'; postgres: select * from t2 where bytea_col = '\x74657374'; => sqlite execute: select * from t2 where bytea_col = '\x74657374';
No, I meant for both cases something like
sqlite execute: select * from t2 where bytea_col = '\x74657374' or bytea_col = 'test';
Because for non-STRICT
tables real data can be text
or BLOB
and we have no guarantee.
For bytea
search not sqlite execute: select * from t2 where bytea_col = '\x74657374';
, but sqlite execute: select * from t2 where cast(bytea_col as blob) = '\x74657374';
For text search not sqlite execute: select * from t2 where bytea_col = 'test';
but > sqlite execute: select * from t2 where cast (bytea_col as text) = 'test';
If so, his may not feasible on FDW because there is implicit cast in OP: bytea_col = 'test' this corresponding with bytea_col = 'test'::bytea and the implicit cast is executed before we get this value in FDW (when build sqlite query):
explain verbose select * from tbl where bytea_col = 'test'; QUERY PLAN ------------------------------------------------------------ Seq Scan on public.tbl (cost=0.00..21.00 rows=4 width=64) Output: bytea_col, text_col Filter: (tbl.bytea_col = '\x74657374'::bytea) (3 rows)
Yes, I am about similar problem.
I check again the table:
* description of SQLite TEXT => float4/float8 (-), int2/int4/int8 (x) to all V+. With current behavior: sqlite_fdw does not support these cast => it should be (∅) or (-)
Marked as -
, thanks.
* Also with SQLite TEXT => numeric (V+) and bytea (-), sqlite_fdw does not support these cast => it should be (∅) or (-)
Fixed. ∅
for numeric
, -
for bytea
. We can thereat SQLite text
as UTF-8 text blob
.
* The cast SQLite BLOB => float4/float8/int2/int4/int8 (b - show per-bit form) the current behavior (cast behavior of SQLite API) is: SQLite BLOB -> SQLite TEXT -> C number: https://www.sqlite.org/lang_expr.html#castexpr * The cast SQLite BLOB -> numeric (∅) => it should be (V+): the cast behavior: SQLite BLOB -> SQLite TEXT (sqlite3_column_text()) -> input function of postgres
Fixed. Please carefully review the table again, @nxhai98. I'll come back at the end of next week after studying about testing sqlite_fdw
in PostgreSQL source code tree.
@mkgrgis Thank for fixing, I confirmed the table.
It's important don't consider about current Postgres encoding! UTF-8 output of SQLite is not problem, also UTF-16 SQLite databases is not problem also. What if there will be some EUC_JP, EUC_CN or WIN866 PostgreSQL? Maybe we should mark all text output of SQLite as UTF-8 input for PostgreSQL?
I understood there is a problem about not match character set. However, sqlite_fdw does not support encoding conversion, it assumes that encoding of SQLite and Postgres are the same.
If you want to fix it, here is a suggestion:
Postgres's core provides a function:
/*
* Convert src string to another encoding (general case).
*
* See the notes about string conversion functions at the top of this file.
*/
unsigned char *
pg_do_encoding_conversion(unsigned char *src, int len, int src_encoding, int dest_encoding);
It can apply as below to sqlite_fdw:
char *utf8 = (char *) sqlite3_column_text(stmt, colid); // <-- assumes SQLite text is always UTF8
char *valstr = pg_do_encoding_conversion(utf8, strlen(utf8), PG_UTF8, GetDatabaseEncoding()); // <-- convert from utf8 to current database encoding
value_datum = InputFunctionCall(&attinmeta->attinfuncs[attnum],
valstr,
attinmeta->attioparams[attnum],
attinmeta->atttypmods[attnum]);
But it is not a complete solution.
Ideally, sqlite_fdw needs to know encoding of both PostgreSQL and SQLite, and converts data if they are different.
It also required encoding conversion when build query with text constant (in sqlite_deparse_const()
function).
No, I meant for both cases something like sqlite execute: select * from t2 where bytea_col = '\x74657374' or bytea_col = 'test'; Because for non-STRICT tables real data can be text or BLOB and we have no guarantee. For bytea search not sqlite execute: select * from t2 where bytea_col = '\x74657374'; , but sqlite execute: select * from t2 where cast(bytea_col as blob) = '\x74657374'; For text search not sqlite execute: select * from t2 where bytea_col = 'test'; but > sqlite execute: select * from t2 where cast (bytea_col as text) = 'test';
I understood your spec, but it seems not feasible because of we do not know when bytea_col
is compared with text constant.
How do you think about this spec below:
For bytea search: treat BLOB column as BYTEA column, we add the implicit cast for all blob columns when build SQLite query (deparse):
Postgres: SELECT bytea_col FROM t2 WHERE bytea_col = '\x74657374';
-> sqlite: SELECT CAST(bytea_col AS BLOB) FROM t2 WHERE CAST(bytea_col AS BLOB) = x'74657374';
For text search: treat BLOB column as TEXT column, user must be define BLOB column as text column in foreign table and add implicit cast for all text columns when build SQLite query (deparse):
CREATE FOREIGN TABLE t2 (bytea_col TEXT) SERVER sqlite_svr; <- define BLOB column as TEXT column on postgres
Postgres: SELECT bytea_col FROM t2 WHERE bytea_col = 'test';
-> sqlite: SELECT CAST(bytea_col AS TEXT) FROM t2 WHERE CAST(bytea_col AS TEXT) = 'test';
Here is an example: SQLite:
sqlite> CREATE TABLE BLOB_TBL(c1 BLOB);
sqlite> INSERT INTO BLOB_TBL VALUES (x'e38386e382b9e38388');
sqlite> INSERT INTO BLOB_TBL VALUES ('test');
sqlite> INSERT INTO BLOB_TBL VALUES (x'74657374');;
sqlite> SELECT c1, typeof(c1) FROM BLOB_TBL;
テスト|blob
test|text
test|blob
PostgreSQL:
test=# CREATE FOREIGN TABLE "BLOB_TBL" (c1 text) SERVER sqlite_svr ; <- define BLOB column as TEXT column
CREATE FOREIGN TABLE
test=# select * from "BLOB_TBL";
c1
--------
テスト
test
test
(3 rows)
test=# explain verbose select * from "BLOB_TBL" where c1 = 'test';
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Foreign Scan on public."BLOB_TBL" (cost=10.00..7.00 rows=7 width=32)
Output: c1
SQLite query: SELECT CAST (`c1` AS TEXT) FROM main."BLOB_TBL" WHERE ((CAST (`c1` AS TEXT) = 'test'))
(3 rows)
test=# select * from "BLOB_TBL" where c1 = 'test';
c1
------
test <- this is TEXT type on SQLite
test <- this is BLOB type on SQLite
(2 rows)
Thanks, @nxhai98 ! I am sorry for waiting.
@mkgrgis Thank for fixing, I confirmed the table.
Let's use this table ;-)
char *utf8 = (char *) sqlite3_column_text(stmt, colid); // <-- assumes SQLite text is always UTF8 char *valstr = pg_do_encoding_conversion(utf8, strlen(utf8), PG_UTF8, GetDatabaseEncoding()); // <-- convert from utf8 to current database encoding value_datum = ....
Yes, this is full SELECT
solution I meant, @nxhai98 ! And also we need something like this in sqlite_deparse_const()
for WHERE
solution, You are completely right! We have guarantees for always and only UTF-8 for sqlite3_column_text
output independent on PRAGMA encoding
in SQLite. Hence we have only PostgreSQL encoding problems, no similar in SQLite.
Also important function is sqlite3_column_bytes
for empty text
and empty blob
data in SQLite that treated as special data transformation case.
Postgres: SELECT bytea_col FROM t2 WHERE bytea_col = '\x74657374'; -> sqlite: SELECT CAST(bytea_col AS BLOB) FROM t2 WHERE CAST(bytea_col AS BLOB) = x'74657374';
CREATE FOREIGN TABLE t2 (bytea_col TEXT) SERVER sqlite_svr; <- define BLOB column as TEXT column on postgres Postgres: SELECT bytea_col FROM t2 WHERE bytea_col = 'test'; -> sqlite: SELECT CAST(bytea_col AS TEXT) FROM t2 WHERE CAST(bytea_col AS TEXT) = 'test';
Nice! Your elegant solution is better than my mechanical solution. I consider this proposals and tests as SQL:2016 recommended for our cases.
How we will begin a draft for the table, @nxhai98 ?
I hope some my drafts will be usefully for full data transformation implementation:
- static const char* sqlite_datatype(int t)
- Text affinity and datatype variables
- Human readable message about runtime data transformation error
What do You think about algorithm structure? Maybe it will be usefully to pack all transformation table logic only in switch (pgtyp)
without any pre-operations and write special small runtime_data_transformation_error
function?
@mkgrgis, thank for your fb,
What do You think about algorithm structure? Maybe it will be usefully to pack all transformation table logic only in switch (pgtyp) without any pre-operations and write special small runtime_data_transformation_error function?
Yes, I think it may better, I see your draft, sqlite_convert_to_pg
looks complicated, pack all logical on switch (pgtyp)
should be easier to understand spec.
BTW, we have just pushed the bug fixing related to sqlite_convert_to_pg
function, could you check and merge them to your implementation?
Thanks, @nxhai98 ! Your contribution in https://github.com/pgspider/sqlite_fdw/pull/71 is very interesting. I am learning the changes. I'll create a new branch based on this first with little C-code changes without any behaviour modifications and second with step-by-step PRs with small behaviour changes by our table.
@t-kataym and @nxhai98, I have made a PR without changing SQL behaviour to prepare implementation of data transformation table here discussed. Please review https://github.com/pgspider/sqlite_fdw/pull/74. My plan for next little PRs:
- Add separate runtime error function for future cases with impossible data converting
- Add text transformation to respect PostgreSQL database encoding
- Many line-by-line little PRs for the data transformation table here discussed
Thanks @t-kataym and @nxhai98 ! I have made PR https://github.com/pgspider/sqlite_fdw/pull/75 with separate function for data type error message. My plan for next little PRs:
- Add text transformation to respect PostgreSQL database encoding (this behaviour doesn't have a tests now)
- Many line-by-line little PRs for the data transformation table.
Thanks @t-kataym and @nxhai98 ! I have made PR https://github.com/pgspider/sqlite_fdw/pull/76 with text encoding transformation.
My next PRs will be about implementation of data transformation table.
@mkgrgis Thank you for your development. Please request @nxhai98 to review the changes on PR #76 if you finished.
@t-kataym, how can I request @nxhai98 to review my PR? I can only write here my request for @nxhai98, but not something more. I think reviewers for a PR can be assigned only by repository owners.
@mkgrgis Could you describe a comment in PR #76 after finished? It seems that you are in progress because test cases has not been added yet.
@t-kataym, can anybody help me with testing conception(plan) in https://github.com/pgspider/sqlite_fdw/pull/76#issuecomment-1567740629 ? I am frustrated during adding a new tests with database encodings.
@mkgrgis Could you ask @bichht0608 about testing?
@mkgrgis Could you ask @bichht0608 about testing?
Yes, @t-kataym . I have https://github.com/pgspider/sqlite_fdw/pull/76#issuecomment-1567874357 asked.
@t-kataym in https://github.com/pgspider/sqlite_fdw/pull/76 extended tests after review by @bichht0608 are ready. When should I get opinion of pgspider team?
In next PR there will implemented formally switch
in switch
grid by affinity-datatype table without SQL behaviour change.
In other PR after next PR there will be new tests and new implementation according some cells of the table.