datasette icon indicating copy to clipboard operation
datasette copied to clipboard

Filters fail to work correctly against calculated numeric columns returned by SQL views because type affinity rules do not apply

Open rayvoelker opened this issue 2 years ago • 8 comments

I found a strange behavior, and I'm not sure if it's related to views and boolean values perhaps, or if there's something else weird going on here, but I'll provide an example that may help show what I'm seeing happen.

#!/bin/bash

echo "\"id\",\"expiration_date\"
0,2018-01-04
1,2019-01-05
2,2020-01-06
3,2021-01-07
4,2022-01-08
5,2023-01-09
6,2024-01-10
7,2025-01-11
8,2026-01-12
9,2027-01-13
" > test.csv
csvs-to-sqlite test.csv test.db
sqlite-utils create-view --replace test.db test_view "select id, expiration_date, case when julianday('NOW') >= julianday(expiration_date) then 1 else 0 end as has_expired FROM test"
datasette test.db

image

image

image

image

Thanks again and let me know if you want me to provide anything else!

rayvoelker avatar Mar 20 '22 19:03 rayvoelker

Oh this is fascinating! I replicated the bug (thanks for the steps to reproduce) and it looks like this is down to the following:

image

Against views, where has_expired = 1 returns different results from where has_expired = '1'

This doesn't happen against tables because of SQLite's type affinity mechanism, which handles the type conversion automatically.

simonw avatar Mar 21 '22 22:03 simonw

Relevant section of the SQLite documentation: 3.2. Affinity Of Expressions:

When an expression is a simple reference to a column of a real table (not a VIEW or subquery) then the expression has the same affinity as the table column.

In your example, has_expired is no longer a simple reference to a column of a real table, hence the bug.

Then 4.2. Type Conversions Prior To Comparison fills in the rest:

SQLite may attempt to convert values between the storage classes INTEGER, REAL, and/or TEXT before performing a comparison. Whether or not any conversions are attempted before the comparison takes place depends on the type affinity of the operands.

simonw avatar Mar 21 '22 22:03 simonw

I wonder if this will be a problem with generated columns, or with SQLite strict tables?

My hunch is that strict tables will continue to work without any changes, because https://www.sqlite.org/stricttables.html says nothing about their impact on comparison operations. I should test this to make absolutely sure though.

Generated columns have a type, so my hunch is they will continue to work fine too.

simonw avatar Mar 21 '22 22:03 simonw

Thinking about options for fixing this...

The following query works fine:

select * from test_view where cast(has_expired as text) = '1'

I don't want to start using this for every query, because one of the goals of Datasette is to help people who are learning SQL:

  • #1613

If someone clicks on "View and edit SQL" from a filtered table page I don't want them to have to wonder why that cast is there.

But... for querying views, the cast turns out to be necessary.

So one fix would be to get the SQL generating logic to use casts like this any time it is operating against a view.

An even better fix would be to detect which columns in a view come from a table and which ones might not, and only use casts for the columns that aren't definitely from a table.

The trick I was exploring here might be able to help with that:

  • #1293

simonw avatar Mar 21 '22 22:03 simonw

The alternative to using cast here would be for Datasette to convert the "1" to a 1 in Python code before passing it as a param.

This feels a bit neater to me, but I still then need to solve the problem of how to identify the "type" of a column that I want to use in a query.

simonw avatar Mar 22 '22 17:03 simonw

No, I think I need to use cast - I can't think of any way to ask SQLite "for this query, what types are the columns that will come back from it?"

Even the details from the explain trick explored in #1293 don't seem to come back with column type information: https://latest.datasette.io/fixtures?sql=explain+select+pk%2C+text1%2C+text2%2C+[name+with+.+and+spaces]+from+searchable_view+where+%22pk%22+%3D+%3Ap0&p0=1

simonw avatar Mar 22 '22 17:03 simonw

Note that Datasette does already have special logic to convert parameters to integers for numeric comparisons like >:

https://github.com/simonw/datasette/blob/c4c9dbd0386e46d2bf199f0ed34e4895c98cb78c/datasette/filters.py#L203-L212

Though... it looks like there's a bug in that? It doesn't account for float values - "3.5".isdigit() return False - probably for the best, because int(3.5) would break that value anyway.

simonw avatar Mar 22 '22 17:03 simonw

Also made me realize that this query:

select * from sortable where sortable > :p0

Only works here thanks to the column affinity thing kicking in too: https://latest.datasette.io/fixtures?sql=select+*+from+sortable+where+sortable+%3E+%3Ap0&p0=70

simonw avatar Mar 22 '22 17:03 simonw