prisma-engines icon indicating copy to clipboard operation
prisma-engines copied to clipboard

feat: geospatial types support

Open Oreilles opened this issue 9 months ago • 39 comments

Fixes https://github.com/prisma/prisma/issues/1798 Fixes https://github.com/prisma/prisma/issues/2789

This PR implements support for database geometry types. I'll start this message by saying that, although I'd be very proud of making such a contribution to Prisma, I understand that the likeliness of such a massive PR to be accepted is pretty low due to the time I'd take to fully review. How did I end up there ? In the end of last year, I tried to get in touch with the team as I had some experience in the field, and also had the time to work on the feature - but by the time we managed to exchange emails I no longer had enough time to invest in working on this. Last month, I had time and remembered about this project, so I started browing the prisma repo - and realised the ORM core was not actually Typescript, but full Rust, which I had no previous experience with. I questioned my ability to be up to the task, but though: why not have a go at it ? Not having sufficient rust experience, I decided against getting back in touch with the Prisma team and just leverage this project as a learning experience. A month later, although it is very far from being production ready, I believe this branch might deserve some attention which is why I decide to release it. Again, apologies about the size. I worked on this as a learning experience and ended up getting quite ahead of myself.. heh.

Now for the meat of the PR, here's what it includes:

In the psl / query-engine:

  • Two new ScalarTypes: Geometry and GeoJSON. Both types refer to the same underlying database types, they only differ in the geometry serialization format the user wants to use: GeoJSON for GeoJSON, EWKT for Geometry. Why do we need both ? GeoJSON is the format most people wants to use, as they most likely use TypeScript and GeoJSON is the de-facto standard for manipulating geometries on the web. But Ewkt allows for richer information about the raw geometry: it can be used to serialize geometry in any CRS (when GeoJSON is limited by specification to WGS84), and can also represent more geometry types than geojson (CIRCULARSTRING , CURVEPOLYGON etc). As such, only EWKT can be used to safely & exhaustively read database geometries.
  • A new filtering condition: Geometry(Not)Within mapping to st_within
  • A new filtering condition Geometry(Not)Intersects mapping to st_intersects

In quaint:

  • Two new enum Values: Geometry and Geography, which both holds the same type (GeometryValue, a struct with a WKT string and an SRID i32 ). We might have needed only one, but MSSQL required making the two explicitly different.
  • A new function geom_from_text, that maps to the SQL/MM st_geomfromtext.
  • A new function geom_as_text, that maps to st_astext.
  • A new compare function geometry_is_valid that maps to st_isvalid
  • A new compare function geometry_is_empty that maps to st_isempty
  • A new compare function geometry(_not)_within that maps to st_within
  • A new compare function geometry(_not)_intersects that maps to st_intersects
  • A new compare function geometry_type(_not)_equals that maps to st_geometrytype(x) = ...

All in all, what works:

  • GeoJSON & EWKT I/O for CockroachDB, MySQL, MSSQL
  • GeoJSON & EWKT I/O for PostgreSQL (as long as PostGIS is installed and the public schema is exposed to Prisma)
  • GeoJSON & EWKT I/O for SQLite (as long as Spatialite is available and exposed with SPATIALITE_PATH)
  • GeoJSON I/O for MongoDB
  • Within and Intersects operators for all vendors except MongoDB (Mongo has the $geoWithin and $geoIntersects query operators, however they cannot be used with the current connector implementation, since those are Query operators wherehas Prisma uses Aggregation operators. Unfortunately, they're not compatible, and no geospatial filtering operators are available for aggregate queries in MongoDB.
  • Introspection / Schema description for all vendors (both directions)

What hasn't been fully implemented / properly tested:

  • Migrations (particularily casts)
  • Default value handling
  • Index type & operators handling (postgres only ?)

There is still a lot of work to do here, but I was quite happy to get the core features (introspection, IO and filtering) going and though it was worth sharing.

My intent with this PR is to show a possible implementation, start a discussion with the team and community about its viability, potential drawbacks and help making these features reach Prisma in a production ready state. Questions about the implementation details are welcome and encouraged. Please get in touch at aurele.nitoref at icloud if you'd like to discuss this more thoroughly!

Oreilles avatar Sep 06 '23 14:09 Oreilles

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 06 '23 14:09 CLAassistant

Absolutely boss 🚀

jordiup avatar Sep 07 '23 09:09 jordiup

CodSpeed Performance Report

Merging #4208 will not alter performance

Comparing Oreilles:feat/geometry (e0f834d) with main (5c4707e)

Summary

✅ 11 untouched benchmarks

codspeed-hq[bot] avatar Sep 07 '23 19:09 codspeed-hq[bot]

Amazing.

So that the tests are automatically run going forward: Can you quickly do a PR to main where you change line 5 to [![Cargo docs](https://github.com/prisma/prisma-engines/actions/workflows/on-push-to-main.yml/badge.svg)](https://github.com/prisma/prisma-engines/actions/workflows/on-push-to-main.yml)? I can then merge that, and the repository will remember that we trust you and run the tests without me clicking a button. (And that button needs to be fixed anyway 😆)

janpio avatar Sep 07 '23 19:09 janpio

So there was quite a lot of things I had overlooked when testing on my machine..! I squashed all the fixes and we now have a mostly green test suite 👍

Oreilles avatar Sep 08 '23 13:09 Oreilles

🚀

Two test failures left for now:

  1. Something with 1 PostgreSQL metrics test, 7 v.s 9 metrics recorded or something. Could your PR cause this? (We recently change something in metrics, so you might want to rebase (Update branch button in GH UI) to get the most recent version of that code).
  2. A lot of unhappiness in the Vitess 8 tests with it not being able to introspect the tables because of permission problems. Strange. That looks a bit ... weird, even flaky maybe? I retriggered a test run there. Maybe it just goes green.

PS: No need from our side to combine your PRs and force push, we only merge via "Squash and merge" which combines all the commits, and with the individual commits of the fixes it might be easier to understand what test failures where fixed where - or introduced.

janpio avatar Sep 08 '23 15:09 janpio

So the two additional queries breaking the tests are due to:

CREATE EXTENSION IF NOT EXISTS postgis https://github.com/Oreilles/prisma-engines/blob/a588e1296288b79df0204397a441f2b3d891b9d9/query-engine/connector-test-kit-rs/qe-setup/src/postgres.rs#L42

and

SELECT PostGIS_version() https://github.com/Oreilles/prisma-engines/blob/a588e1296288b79df0204397a441f2b3d891b9d9/schema-engine/connectors/sql-schema-connector/src/flavour/postgres.rs#L628-L630

Both are needed to run the tests, I updated the counter to 9 to make the test succeed. ( Also rebased on main just in case... )

No clue about the Vitess issue though...

~~EDIT: I tried running the vitess tests on prisma:main on my machine and the result is the same, which makes me assume that this is not due to a code change in this PR...~~

EDIT: it's now all fixed 🙌

Oreilles avatar Sep 08 '23 17:09 Oreilles

How can I help to merge it?

di-sukharev avatar Oct 19 '23 02:10 di-sukharev

Test it, figure out how it would fit into Prisma, let the author and us know what else is needed to make this a complete feature.

janpio avatar Oct 19 '23 09:10 janpio

@janpio @Oreilles thanks for all you do guys. I am at the point in my project to start introducing PostGIS and trying to figure out how to test this PR. I cloned @Oreilles feature branch and compiled debug targets of prisma-engines how do i tie it into existing prisma to run test migration/introspection ? thank you

okonon avatar Oct 23 '23 21:10 okonon

@okonon Have a look at the documentation: https://www.prisma.io/docs/concepts/components/prisma-engines#using-custom-engine-libraries-or-binaries

RafaelKr avatar Oct 24 '23 08:10 RafaelKr

@RafaelKr thanks a lot for that sorry i missed that section. I was able to test introspection from existing database (prisma db pull): image Looks like it is introspecting correctly.

okonon avatar Oct 25 '23 17:10 okonon

Adding a comment to voice support for this to be added to Prisma! Looking forward to seeing this merged; keep up the good work.

beardo01 avatar Nov 08 '23 00:11 beardo01

Definitely would love that to be merged as well :+1:

toniopelo avatar Nov 08 '23 11:11 toniopelo

Sorry, but this will definitely not be merged as it is only part of the full feature that needs to be implemented.

But we are interested in bringing it there, so let's see what we can do here.

janpio avatar Nov 08 '23 21:11 janpio

For now I created a branch based on the commit this was forked from, so that there are no conflicts and @Oreilles can keep committing if they want.

I also created a local branch of the code that runs all our CI (Buildkite is not run on external contributions), with the goal of publishing the engines to our infrastructure, so we can build a Prisma CLI and Client with them for further testing.

But unfortunately some tests are failing (results only internally visible): https://github.com/prisma/prisma-engines/pull/4427 image Turns out GitHub Actions do not run for these databases, and seems the current code has some problems - and that is blocking my plan to release these engine files :/

Random example of a SQLite failure, our of many ([2023-11-08T21:25:35Z] test result: FAILED. 622 passed; 970 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.73s):

[2023-11-08T21:25:35Z] thread 'writes::top_level_mutations::delete_many_relations::delete_many_rels::pm_c1_no_children' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: QueryError(SqlInputError { error: Error { code: Unknown, extended_code: 1 }, msg: "no such function: InitSpatialMetaData", sql: "SELECT InitSpatialMetaData()", offset: 7 }), original_code: Some("1"), original_message: Some("no such function: InitSpatialMetaData") }', /root/build/query-engine/connector-test-kit-rs/qe-setup/src/sqlite.rs:9:65

MySQL 5.6 only fails 2:

[2023-11-08T21:35:48Z] failures:
[2023-11-08T21:35:48Z]     queries::filters::geometry_filter::geometry_filter_spec::geometric_comparison_filters
[2023-11-08T21:35:48Z]     writes::data_types::native_types::mysql::mysql::native_geojson_srid_geometry_types

5.7 only 1 even:

[2023-11-08T21:39:12Z] failures:
[2023-11-08T21:39:12Z]     writes::data_types::native_types::mysql::mysql::native_geojson_srid_geometry_types

MariaDB the same as 5.6:

[2023-11-08T21:31:19Z] failures:
[2023-11-08T21:31:19Z]     queries::filters::geometry_filter::geometry_filter_spec::geometric_comparison_filters
[2023-11-08T21:31:19Z]     writes::data_types::native_types::mysql::mysql::native_geojson_srid_geometry_types

Trying to get these tests to run on GitHub Actions now. But maybe you have speculative fixes @Oreilles?

janpio avatar Nov 08 '23 21:11 janpio

Interesting, as far as I can remember (almost) all tests were succeeding the last time we discussed the PR, and we did have tests for all MySQL versions... though I might remember wrong. Did you resolve merge conflicts when you merged main, which could explain tests failing afterwards ?

Anyway, I'll have a look as soon as possible. ~~@janpio if that's something you can do, please send me the test cli commands to reproduce the failing tests.~~

Edit: nevermind, I had not seen that they were already mentioned in your message!

Oreilles avatar Nov 08 '23 22:11 Oreilles

Yes, I realize that the main merge was not optimal to fully reproduce what was going on here - sorry. I thought about reproducing the state before I did that and make that run in Buildkite, but I couldn't figure out how really.

janpio avatar Nov 08 '23 22:11 janpio

Should be good now:

  • SQLite tests were failing due to an unsuccesfull initialization of Spatialite, which I had unwrapped - it's now oked instead. Tests involving Spatialite will be filtered out later if it is not available.

  • The write test failing with MySQL was the one involving geometry columns with srid constraints, which are only available starting with MySQL 8, hence the failing schema initialization.

  • The last one is a "fun" one: in MySQL 5.6 and MariaDB, the result of this expression is false:

    elect st_intersects(
     st_geomfromtext('POINT(0 0)'),
     st_geomfromtext('POINT(0 0)')
    ;
    

    I replaced the intersection test with one working with all vendors.

Oreilles avatar Nov 13 '23 01:11 Oreilles

Merged main again and had to solve quite a lot of conflicts: there apparently was some pretty big change in the code base recently (I'd be interested in knowing what were the motivation behind the quaint's new ValueType).

Unfortunately it seems like some of the recent changes broke the current implementation, as the query engine tests are now failing for mssql and postgres, which will likely take a bit longer to figure out

Oreilles avatar Nov 13 '23 14:11 Oreilles

Thanks for even tackling the merge conflicts, that was not expected.

I suggest you add this to your query-engine.yml so that it runs the same tests as Buildkite will: https://github.com/prisma/prisma-engines/pull/4428/files You can see that MySQL (5.7 and 8) and MariaDB fail on GH Actions with 1 test - but we should be able to ignore that here and Buildkite would then still pass (I opened an issue to clean this up at https://github.com/prisma/prisma/issues/21896, but no one had time to look at it yet).

Looking at your last commit's results, you can definitely ignore:

  • https://github.com/prisma/prisma-engines/actions/runs/6850990006?pr=4208 (we are currently cleaning those up and will have them green soon I hope)
  • https://github.com/prisma/prisma-engines/actions/runs/6850990009/job/18626346956?pr=4208 and https://github.com/prisma/prisma-engines/actions/runs/6850990009/job/18626347232?pr=4208 - those are flaky, so I could rerun them a few times to make them pass, but that is ultimately irrelevant

All others are the same singular test that fails: writes::top_level_mutations::create::geometry_create::create_geometry - not too bad. I was afraid there was more after your message above.

janpio avatar Nov 13 '23 18:11 janpio

Turns out it was a much smaller issue that I thought! I squashed the main merge to avoid polluting this PR so for reference, this line was the culprit (the null geometry case was seemingly handled elsewhere previously)

https://github.com/Oreilles/prisma-engines/blob/59410f64952de4ed512f8983a4e242b5f327ef5c/quaint/src/visitor.rs#L220

Oreilles avatar Nov 13 '23 19:11 Oreilles

I'd be interested in knowing what were the motivation behind the quaint's new ValueType

This is what my colleague shared:

Previously quaint::Value was current quaint::ValueType it was a enum, which means that if we needed to enrich the representation of a value with metadata for all variants, we needed to modify the source code wherever any variant was referenced. To solve this, and add metadata about a the native column type for postgres (needed to encode information for driver adapters) we promoted quaint::Value from an enum to a struct, and use the old enum as a property of said struct, but renaming the type to quiant::ValueType

Some more information: https://github.com/prisma/prisma-engines/pull/4311

janpio avatar Nov 14 '23 09:11 janpio

thanks for moving this forward.

okonon avatar Nov 16 '23 15:11 okonon

Another look at the failing tests:

  • Driver Adapter, LibSQL and SQLite are probably because missing extension? (The LibSQL one we could ignore for now)
  • All others are the same test writes::top_level_mutations::create::geometry_create::create_geometry caused by Inconsistent column data: Conversion failed: Value of type Column { catalog: ConstBytes(PhantomData<mysql_common::packets::Catalog>)
  • Vitess ones are flaky and can be ignored

janpio avatar Nov 17 '23 09:11 janpio

(Let me know if you prefer to fix the resolve conflicts, or if I should do a branch from yesterday's state so that your PR here does not have conflicts and can keep building and running in CI)

janpio avatar Nov 27 '23 19:11 janpio

I rebased the branch on main so there shouldn't be any more conflicts for now. I also managed to add the Spatialite tests to the CI 👍

There's a failing test I can't wrap my head around: https://github.com/prisma/prisma-engines/actions/runs/7017021601/job/19089461750?pr=4208

{"errors":[{"error":"Error occurred during query execution:\nConnectorError(ConnectorError { user_facing_error: None, kind: QueryError(Server(MysqlError { code: 4079, message: \"Illegal parameter data type varchar for operation 'st_within'\", state: \"HY000\" })), transient: false })","user_facing_error":{"is_panic":false,"message":"Error occurred during query execution:\nConnectorError(ConnectorError { user_facing_error: None, kind: QueryError(Server(MysqlError { code: 4079, message: \"Illegal parameter data type varchar for operation 'st_within'\", state: \"HY000\" })), transient: false })","backtrace":null}}]}

The strange part is that this test is succeeding for all MySQL versions except MariaDB. To give a bit of context, I decided to handle geometries differently for the MySQL connector to avoid a classic footgun: MySQL 8 broke backwards compatibility by swapping coordinates in WKT representation for WGS84 geometries (https://dba.stackexchange.com/questions/242001/mysql-8-st-geomfromtext-giving-error-latitude-out-of-range-in-function-st-geomfr). There is a way to prevent this behavior by using st_geomfromtext and st_astext with a third argument 'axis-order=long-lat' - but unfortunately, I didn't find a way in Quaint to output custom SQL depending on the connected database version. So instead of using those function, I'm reading and writing the internal binary blob representation directly. This seems to work fine for everything except for this specific test with MariaDB, where calling ST_Within(geom, <geometry binary blob>) fails with the aforementioned error: Illegal parameter data type varchar for operation 'st_within'. For some reason, the binary blob seems to be interpreted as varchar... or is there something else I missed ?

Oreilles avatar Nov 28 '23 03:11 Oreilles

Seems like you are getting somewhere.

For the Spatialite problem:

No space left on device : '/home/ubuntu/actions-runner/_diag/blocks/acc9258d-1293-4903-99d1-cd922e79f8df_4482dee1-f750-5a60-3af2-b3e57514afa7.2'

There are actions that makes more space on the runners: https://github.com/marketplace/actions/maximize-build-disk-space and a few more. But is it expected to take that much space actually?

Only a small number of failing tests on MySQL(like) databases either.

(You can continue ignoring the flakyness in the driver adapter test)

janpio avatar Nov 29 '23 17:11 janpio

The failing test on MySQL 5.7 and 8 isn't caused by this PR, as you can see it's also failing on your PR adding those versions to the query engine test CI: https://github.com/prisma/prisma-engines/actions/runs/6870390796/job/18685172620?pr=4428

Regarding the "No space left" error, the problem is that an empty Sqlite database with Spatialite loaded (with SELECT InitSpatialMetadata();) is 7Mo, and since the database file are not cleaned up after each test... well after 330 tests that adds up to a lot 😄. We can circumvent this by initializing Spatialite with SELECT InitSpatialMetadata('WGS84_ONLY') as that avoids fully loading the spatial_ref_sys table taking all the space. However, this will also affect usage outside the test suite, since all SRIDs are currently allowed for the Sqlite connector. I'll add this option for now so that the tests can run while we decide on the best way to handle this.

Regarding the MariaDB failing st_within test I'd be glad if anyone has any insight on what could cause the bug (Why would a binary blob be sent as varchar, and only with MariaDB ..?)

Oreilles avatar Nov 29 '23 22:11 Oreilles

Thanks for the reminder that our MySQL tests are failing by default, you are of course correct :D

janpio avatar Nov 30 '23 10:11 janpio