better-sqlite3 icon indicating copy to clipboard operation
better-sqlite3 copied to clipboard

Export sqlite3_create_collation

Open BronislavKlucka opened this issue 3 years ago • 6 comments

Hello, it would be nice to add the ability to create own collation mechanism using sqlite3_create_collation similar as with custom functions and aggreagtes.

Thanks

BronislavKlucka avatar Jun 22 '21 20:06 BronislavKlucka

I wanted to do this, but the problem is that sqlite3_create_collation() doesn't appear to have any mechanism for reporting an error. Any JavaScript function can throw an error, and there would be no way to propagate that error to SQLite to abort the query. This could lead to incorrect query results or database corruption. For that reason, I've decided not to implement it.

JoshuaWise avatar Jun 25 '21 19:06 JoshuaWise

But surely when a collating JS function should throw an error, that would be visible / propagate to the point where the query is executed? Could one catch that condition and then continue the query (assuming it can not be stopped), replacing all return values with 0 (for 'a equals b')?

To be sure the linked docs do outline a number of invariables that must hold for a meaningful collating function (such as transitivity and so on). They do state that the effects of using a function that does not meet these invariables are undefined. Surely effects of using an inept collating function will be limited to it producing an unexpected or maybe an unstable sorting order (not a ruined DB file or anything like that). Given that it is hard to impossible for static code analysis and tests alike to prove such invariables to hold for all combinations of calling cmp( a, b ) for any sizable sets of values (e.g. all numbers), treating such cases as introducing undefined behavior (i.e. garbage in, anything out) or act-as-if semantics (should your collation function throw, ordering of results is not defined, as if you had only returned 0 from that point for all values) does not sound unreasonable.

I think the viable part is that exceptions in the collating function should not silently swallowed.

Other than that, it's very easy to write an insert or update that causes an error half way through, be it by calling an external function or not; in those cases, it would depend on the transaction to decide what state the DB should be in (and it could be in an invalid or 'unfortunate' state if each insert happened in its own transaction).

So what does make collating so different from other UDFs? And, if users had to exercise special care when using UD collations, wouldn't it be great if they could at least use unsafe mode AYOR?

FWIW I think the addition of UD single-valued, table-valued, aggregate and window functions, all of which can be written 'at whim' in JavaScript (and, potentially, Rust / WASM?) makes better-sqlite3 even better! I'm really pining for custom collations at this point...

loveencounterflow avatar Jul 07 '21 21:07 loveencounterflow

So I had a look at the source code and realized there's a lot of code to get lost in, I'm afraid I couldn't whip up a demo implementation to toy around with (I'd thought that the JS/C interfacing stuff can be largely automated?).

I then took a look at competing JS and Rust SQLite libraries; of those, only Rusqlite / libsqlite3-sys (which are the same? library, really? or depend on each other?) exposes an interface to create UDFs and virtual tables. Looks like there's only better-sqlite3 if you want this functionality after all! Unfortunately, this also meant that I couldn't just use another library to see how they deal with the error handling issue.

loveencounterflow avatar Jul 08 '21 10:07 loveencounterflow

Could one catch that condition and then continue the query (assuming it can not be stopped), replacing all return values with 0 (for 'a equals b')?

The query cannot be stopped. But continuing the query will lead to undefined behavior, which may cause database corruption.

Surely effects of using an inept collating function will be limited to it producing an unexpected or maybe an unstable sorting order (not a ruined DB file or anything like that)

That is not a safe assumption to make. Sorting is used for indexing, which is used for:

  • foreign key enforcement
  • unique constraint enforcement
  • primary key enforcement
  • etc.

If we break primary keys, then a countless number of extensions (some of which come built-in better-sqlite3) will break, since most extensions use primaries keys to correlate data.

Sorting is a fundamental aspect of a relational database. It's not just used for ORDER BY. The implications of allowing sorting to break are unpredictable and dangerous.

I think the viable part is that exceptions in the collating function should not silently swallowed.

Even if we allow the exception to propagate to JavaScript world, it's too late. The database might already be trashed.

Other than that, it's very easy to write an insert or update that causes an error half way through, be it by calling an external function or not; in those cases, it would depend on the transaction to decide what state the DB should be in (and it could be in an invalid or 'unfortunate' state if each insert happened in its own transaction).

Inserts and updates are high-level operations that were designed to support errors. sqlite3_create_collation() is a much lower-level operation, and it does not support errors.

So what does make collating so different from other UDFs?

Other UDFs explicitly support errors.

And, if users had to exercise special care when using UD collations, wouldn't it be great if they could at least use unsafe mode AYOR?

This could be a reasonable compromise.

JoshuaWise avatar Jul 08 '21 18:07 JoshuaWise

Sorting is a fundamental aspect of a relational database.

Which is exactly why I'm so eager to get this piece of functionality. I'd also love to see how a database gets corrupted by a rogue UDF cmp(). I think it shouldn't happen. Wonky outputs, sure, but it won't corrupt the DB file, will it?

What's more, I think better-sqlite3 is one of the only SQLite adapters out there that supports this wide a range of access to the SQLite API—single-valued and table-valued functions, aggregate and window functions, virtual tables, all of those are accessible and definable in the host language. At least within the JavaScript and Rust ecosystems, it doesn't look to me there's another adapter that offers this full range of options. If error communication is missing from sqlite3_create_collation(), maybe the SQLite devs could be convinced to add error handling to that function to bring it more in line with the existing UDF handling?

loveencounterflow avatar Jul 09 '21 07:07 loveencounterflow

+1, I'd love for this feature to be implemented. Would be fine for my use case to enable unsafe mode.

0xOlias avatar Oct 10 '23 04:10 0xOlias