Repeated columns in SELECT silently dropped with -dialect SQLite
Which one's output is correct? sqlite, two columns:
$ { echo .headers on; echo 'SELECT code,code FROM extent LIMIT 1;';}|\
$ sqlite3 -csv /usr/share/proj/proj.db
code,code
1024,1024
Or ogr2ogr -dialect SQLite, one column:
$ ogr2ogr /vsistdout/ -f CSV raw.csv \
-sql 'SELECT name,name FROM raw LIMIT 1' -dialect SQLite
name
"8145"
The second name gets thrown away just as if it didn't exist in the SELECT statement.
Well, at least with no -dialect SQLite we manage to get a warning:
$ ogr2ogr /vsistdout/ -f CSV \
raw.csv -sql 'SELECT Name,Name FROM raw LIMIT 1'
Warning 1: Field 'Name' already exists. Renaming it as 'Name2'
Name,Name2
"8145","8145"
from which we come up with a workaround for -dialect SQLite:
$ ogr2ogr /vsistdout/ -f CSV raw.csv -sql \
'SELECT Name,Name AS Name2 FROM raw LIMIT 1' -dialect SQLite
name,Name2
"8145","8145"
$ ogr2ogr --version
GDAL 3.10.3, released 2025/04/01
Using correct SQL is not a workaround, it is a solution. The question is how hard we should work for informing users about what they do wrong. And warnign users about all variations is not as simple as it may feel.
CSV is probably the only so called format that allows having dublicate column names like in
code,code
1024,1024
I would say that having such CSV data is an user error. Especially if the data in the columns are not the same, but like
code,code
1024,2048
It may feel that SQLite or PostGIS can handle the dublicated column names created with an SQL query because you can do
SELECT code,code FROM extent LIMIT 1;
However, when you do something similar than what ogr2ogr does, for creating a new table
CREATE TABLE foo AS SELECT code,code FROM extent LIMIT 1;
then SQLite renames the second columns automatically and you will have columns code and code:1.
However, with PostGIS all you will get is an error:
ERROR: column "code" specified more than once
I couldn't figure out which one is right even with the help of AI.
Anyway, I often use duplicate columns. E.g., putting an elevation into a label.
I actually use that issue as a feature when doing things like
SELECT ST_something() AS geometry, * FROM ...
so that the columns with the ST_ function is used as the geometry function and overwrites the original geometry. The probably cleaner way would be to use "* EXCLUDE (geometry)", but SQLite doesn't support it
What you mean by "which one is right"? Using alias is right. Using dublicate field names is something unexpected and programs behave in different ways in that situation. I will update my previous comment soon.
I tried Googling to find the official way SQL should work in such a situation but could not find the answer.
I tried Googling to find the official way SQL should work in such a situation but could not find the answer.
When it comes to the "select" part, the behavior of select code,code from raw is covered around page 50 in the ANSI SQL-86 standard https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub127.pdf. The expected names in the result are code,code. I did not think too much about what they should be when the "value expression" is a function. By experiment it seems that behavior varies according to the SQL engine. select 1,1,1 from raw; gives column names 1,1,1 with SQLite, but ?column?,?column?-2,column?-3 with PostgreSQL.
When it comes to writing side, the standard says: The <column name> shall be different from the <column name> of any other <column definition> in the containing <table definition>. The standard does not say anything about what to do if user tries to violate the rule.
Link to SQL-86 found from https://modern-sql.com/standard. Nome of them defined what GDAL should do when it has data with dublicate column names, it is a different thing. Myself I would edit my data to be non-ambiguous.
I actually use that issue as a feature when doing things like
SELECT ST_something() AS geometry, * FROM ...
so that the columns with the ST_ function is used as the geometry function and overwrites the original geometry. The probably cleaner way would be to use "* EXCLUDE (geometry)", but SQLite doesn't support it
oh nice, I never understood before why that was ok
Just like if one requests
$ echo colour | fixspell™® -dialect American-English
Here the user requested -dialect American-English
so the fixspell™® program had better deliver color, not colour
else it is violating its -dialect promise.
If it is going to violate its -dialect promise then at least it needs to make a peep on STDERR.
It doesn't even need to justify its actions, but at least it needs to alert the user.
Likewise, with ogr2ogr -dialect SQLite, the user is asking for -dialect SQLite, dubious SQL
or not, wrong or right. Anything less deserves at least some message on STDERR.
Same in fact for https://github.com/OSGeo/gdal/issues/11890 .
Perhaps the message could be "Invalid SQL, even if perhaps valid SQLite. Quitting."
Do you admit that "create table as select code,code from raw" cannot work as-is in any SQL database?
Do you admit that "create table as select code,code from raw" cannot work as-is in any SQL database?
I am sorry but that is about three times more complex than my paltry knowledge of SQL.