SQL: Row identifier as a pseudo-column

Open philrz opened this issue 8 months ago • 0 comments

Many SQL databases have the concept of a numeric "row identifier" that is always available on persistent tables as a pseudo/hidden column. It comes with caveats so it may not be a priority for SuperDB to support it, but may be worth consideration since some SQL users may be accustomed to having it available.

Details

At the time this issue is being opened, super is at commit cabf51f.

The concept of a "ROWID Pseudocolumn" was allegedly first introduced by Oracle. It's present in other popular open source SQL implementations such as DuckDB, where I happened to bump into it in one of their sqllogictests.

Starting with input data.csv:

a
"foo"
"bar"
"baz"

The value is available in DuckDB as rowid. Even as the row order changes, the value is stable within the transaction.

$ duckdb my.db
DuckDB v1.3.0 (Ossivalis) 71c5c07cdd
Enter ".help" for usage hints.

D CREATE TABLE foo AS SELECT * FROM read_csv('data.csv');

D SELECT rowid,a FROM foo;
┌───────┬─────────┐
│ rowid │    a    │
│ int64 │ varchar │
├───────┼─────────┤
│     0 │ foo     │
│     1 │ bar     │
│     2 │ baz     │
└───────┴─────────┘

D SELECT rowid,a FROM foo ORDER BY a;;
┌───────┬─────────┐
│ rowid │    a    │
│ int64 │ varchar │
├───────┼─────────┤
│     1 │ bar     │
│     2 │ baz     │
│     0 │ foo     │
└───────┴─────────┘

Note however that it's only available on persistent tables. E.g., this doesn't work when querying directly from a CSV file.

$ duckdb -c "SELECT rowid,a FROM 'data.csv';"
Binder Error:
Referenced column "rowid" not found in FROM clause!
Candidate bindings: "a"

LINE 1: SELECT rowid,a FROM 'data.csv';
               ^

Something similar is available in Postgres, but there's it's called ctid. Note that their numbering happens to start from 1 while DuckDB's started from 0. SQL be SQLin'!

$ psql postgres
psql (17.5 (Homebrew))
Type "help" for help.

postgres=# SELECT ctid,a FROM foo;
 ctid  |  a  
-------+-----
 (0,1) | foo
 (0,2) | bar
 (0,3) | baz
(3 rows)

postgres=# SELECT ctid,a FROM foo ORDER BY foo;
 ctid  |  a  
-------+-----
 (0,2) | bar
 (0,3) | baz
 (0,1) | foo
(3 rows)

The docs for all these systems advise against treating it like a primary key since it can change due to operations like INSERT, DELETE, and VACUUM. But no doubt for simple load & query operations it could be quite handy.

Jun 17 '25 20:06 philrz