sqlite_fdw icon indicating copy to clipboard operation
sqlite_fdw copied to clipboard

push down of `ORDER BY` on indexed float columns ends up doing a seq scan

Open dgwynne opened this issue 11 months ago • 13 comments

say i have a table like this in sqlite:

sqlite> .schema states
CREATE TABLE IF NOT EXISTS "states" (
	state_id INTEGER NOT NULL, 
	entity_id CHAR(0), 
	state VARCHAR(255), 
	attributes CHAR(0), 
	event_id SMALLINT, 
	last_changed CHAR(0), 
	last_changed_ts FLOAT, 
	last_reported_ts FLOAT, 
	last_updated CHAR(0), 
	last_updated_ts FLOAT, 
	old_state_id INTEGER, 
	attributes_id INTEGER, 
	context_id CHAR(0), 
	context_user_id CHAR(0), 
	context_parent_id CHAR(0), 
	origin_idx SMALLINT, 
	context_id_bin BLOB, 
	context_user_id_bin BLOB, 
	context_parent_id_bin BLOB, 
	metadata_id INTEGER, 
	PRIMARY KEY (state_id), 
	FOREIGN KEY(old_state_id) REFERENCES states (state_id), 
	FOREIGN KEY(attributes_id) REFERENCES state_attributes (attributes_id), 
	FOREIGN KEY(metadata_id) REFERENCES states_meta (metadata_id)
);
CREATE INDEX ix_states_context_id_bin ON states (context_id_bin);
CREATE INDEX ix_states_metadata_id_last_updated_ts ON states (metadata_id, last_updated_ts);
CREATE INDEX ix_states_attributes_id ON states (attributes_id);
CREATE INDEX ix_states_last_updated_ts ON states (last_updated_ts);
CREATE INDEX ix_states_old_state_id ON states (old_state_id);

and I have the following foreign table in pg for it:

hass=# \d states
                                 Foreign table "public.states"
        Column         |          Type          | Collation | Nullable | Default | FDW options  
-----------------------+------------------------+-----------+----------+---------+--------------
 state_id              | integer                |           |          |         | (key 'true')
 metadata_id           | integer                |           |          |         | 
 state                 | character varying(255) |           |          |         | 
 last_changed_ts       | double precision       |           |          |         | 
 last_updated_ts       | double precision       |           |          |         | 
 old_state_id          | integer                |           |          |         | 
 attributes_id         | integer                |           |          |         | 
 context_id_bin        | bytea                  |           |          |         | 
 context_user_id_bin   | bytea                  |           |          |         | 
 context_parent_id_bin | bytea                  |           |          |         | 
 origin_idx            | integer                |           |          |         | 
Server: hass
FDW options: ("table" 'states')

a query that tries to sort by the last_updated_ts column doesn't use the index in sqlite because it's accessed via sqlite_fdw_float():

hass=# explain analyze verbose select * from states order by last_updated_ts desc limit 10;
                                                                                                                                                                   QUERY PLAN                                                                                                                      
                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------
 Foreign Scan on public.states  (cost=1.00..1.00 rows=1 width=648) (actual time=1973.694..1973.712 rows=10 loops=1)
   Output: state_id, metadata_id, state, last_changed_ts, last_updated_ts, old_state_id, attributes_id, context_id_bin, context_user_id_bin, context_parent_id_bin, origin_idx
   SQLite query: SELECT `state_id`, `metadata_id`, `state`, sqlite_fdw_float(`last_changed_ts`), sqlite_fdw_float(`last_updated_ts`), `old_state_id`, `attributes_id`, `context_id_bin`, `context_user_id_bin`, `context_parent_id_bin`, `origin_idx` FROM main."states" ORDER BY sqlite_fdw_float(
`last_updated_ts`) DESC NULLS FIRST LIMIT 10
 Planning Time: 0.096 ms
 Execution Time: 1973.816 ms
(5 rows)

a similar query via sqlite itself takes almost no time:

dlg@defeat hass$ time sqlite3 home-assistant_v2.db 'select state_id,metadata_id,state,last_updated_ts from states order by last_updated_ts desc limit 10'
265708270|848|off|1735884102.15356
265708269|609|5.9|1735884102.15348
265708268|475|off|1735884102.15329
265708267|8171|5849.6|1735884099.05672
265708266|633|8.3|1735884099.05631
265708265|830|278.1|1735884099.05625
265708264|533|1640.79|1735884099.05617
265708263|439|6868|1735884099.05609
265708262|304|5222.36|1735884099.05602
265708261|608|5.9|1735884099.05593

real	0m0.002s
user	0m0.001s
sys	0m0.001s

would it be possible to order by the "unwrapped" float value?

dgwynne avatar Jan 03 '25 06:01 dgwynne

would it be possible to order by the "unwrapped" float value?

If in a table will be Infinity or -Inf or NaN or other special IEEE value value with text affinity, here will no ISO:SQL result, but very strange order of values. Does you need this, @dgwynne ? UPD: If yes, I can help you how to implement column option for fast, but not ISO:SQL behaviour in case of you are sure there are only values with real affinity.

mkgrgis avatar Jan 03 '25 06:01 mkgrgis

In this situation there are only reals. there's no infinity, nans, etc.

dgwynne avatar Jan 03 '25 08:01 dgwynne

In this situation there are only reals. there's no infinity, nans, etc.

I think, you potentially can use option column_type with real value for fast implementation of processing for your case. This is key lines of C code https://github.com/pgspider/sqlite_fdw/blob/af42d07f66c532acf91f099ba011e6feea86afc7/deparse.c#L2258-L2269 options usage example is also in the same function and near https://github.com/pgspider/sqlite_fdw/blob/af42d07f66c532acf91f099ba011e6feea86afc7/deparse.c#L2224-L2235 Testcases (TCs) about your situation you can add after https://github.com/pgspider/sqlite_fdw/blob/af42d07f66c532acf91f099ba011e6feea86afc7/sql/17.0/types/float4.sql#L853 This can be something like

ALTER TABLE ... ALTER COLUMN "i" OPTIONS (ADD column_type 'real');
-- Testcase {NEXT}
EXPLAIN (VERBOSE, COSTS OFF)
{YOUR QUERY}

mkgrgis avatar Jan 03 '25 08:01 mkgrgis

@dgwynne , you can base on my draft and add needed TCs. Are you ready to support your own PR with requested functional and regress tests for saving this functional after future PRs?

mkgrgis avatar Jan 03 '25 09:01 mkgrgis

i can try

dgwynne avatar Jan 03 '25 09:01 dgwynne

i can try

Please refer https://github.com/pgspider/fdw_testing_scripts and do reset; ../new_mulver_cycle.sh sqlite_fdw ''; between your changes. Happy implementation and debugging!

mkgrgis avatar Jan 03 '25 09:01 mkgrgis

https://github.com/dgwynne/sqlite_fdw/tree/pure_float seems to work. your draft was very close, just needed one tweak.

i just have to figure test cases out now.

dgwynne avatar Jan 03 '25 10:01 dgwynne

@dgwynne , we have forgotten

if (qualify_col)
ADD_REL_QUALIFIER(buf, varno)

for code without unification. This is obligatory macros for table name before column name if needed.

mkgrgis avatar Jan 03 '25 13:01 mkgrgis

@dgwynne , we have forgotten

if (qualify_col)
ADD_REL_QUALIFIER(buf, varno)

for code without unification. This is obligatory macros for table name before column name if needed.

i've just pushed this to my branch.

still need to look at tests.

dgwynne avatar Jan 05 '25 23:01 dgwynne

still need to look at tests.

@dgwynne , which of tests you can repeat for a new implementation? Look at https://github.com/pgspider/sqlite_fdw/commit/c81bc847570fa25f545dcbde50e5ef356ed86555 and add (repeat) equal test set for your implementation. For example SELECT c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c17, c18, c19, c2, c21, c22, c23, c24 FROM alltypetest; gives SQLite query: SELECT `c1`, `c2`, `c3`, `c4`, `c5`, `c6`, `c7`, `c8`, `c9`, `c10`, `c11`, `c12`, `c13`, `c14`, `c15`, sqlite_fdw_float(`c17`), sqlite_fdw_float(`c18`), sqlite_fdw_float(`c19`), sqlite_fdw_float(`c21`), sqlite_fdw_float(`c22`),c23, c24 FROM alltypetest;. You can set a new option to some columns and test EXPALIN VERBOSE again. Other ideas see in the commit.

mkgrgis avatar Jan 06 '25 05:01 mkgrgis

@dgwynne @mkgrgis Let me confirm about the problem. If we use ORDER BY last_updated_ts, it takes almost no time. If we use ORDER BY sqlite_fdw_float(last_updated_ts), it takes much more time. Is my understanding correct?

lamdn1409 avatar Jan 07 '25 08:01 lamdn1409

@lamdn1409 , I think you did complete description of problem. Isn't it, @dgwynne ? Also I can add sqlite_fdw_float is a deterministic function for SQLite execution context, hence it potentially can be cached.

mkgrgis avatar Jan 07 '25 19:01 mkgrgis

Ping, @dgwynne ! Any problems? My plan of new TCs:

  • Explain
  • SELECT -> error because of any ∞ value with text affinity
  • DELETE ... WHERE typeof(...) = 'text'
  • SELECT -> OK

mkgrgis avatar Feb 04 '25 08:02 mkgrgis