chai
chai copied to clipboard
Add Joins
Abstract
This proposal is mostly based on #297
Proposal: JOINs
For now, Genji have no ways to select several tables in one query. JOINs is the most commonly used way in relation databases.
Changes
-
Add SQL JOIN operation parser.
-
Add a Join operation to the planner and nodes which implementing it. Joining algorithms implemented by #297:
- Nested loops join Basic algorithm with O(n*m) complexity in worst case. Implementation: https://github.com/tdakkota/genji/blob/feature/inner-join/sql/planner/join.go#L65
- Hash join Algorithm with O(n+m) complexity in worst case, but requires a Equi-join predicat in join clause. Implementation: https://github.com/tdakkota/genji/blob/feature/inner-join/sql/planner/join.go#L140
Joining algorithms which not implemented by #297:
- Sort-merge join Algorithm with O(n+m) complexity in worst case, but requires sorted document stream, so with sorting complexity becomes quasilinear O(nlogn+mlogm) Best in-use with index.
A good note about implementing joins from CrateDB blog: https://crate.io/a/joins-faster-part-one/
-
Add a optimizator rule to determine which algorithm should be used For now, #297 adds a optimizer rule which replaces a Nested loops join to Hash join if clause is a equi-join.
-
Add a special joined document type which allow to access fields from all joined tables. And there a problem, Genji is not a relational database, document fields can be undeclared in table definition. So any relative field path in query can be ambigous.
Some ideas to solve:
- Force to use absoulte path to field when query contains JOIN clause, i.e.
table_name.path_to_field
- Disallow use fields which are not defined in table when query contains JOIN
- Force to use absoulte path to field when query contains JOIN clause, i.e.
This probably wont work for all uses, but in my situation, I was coming from SQLite and was using NATURAL JOIN
with Junction tables. Using Genji I was able to ditch that paradigm using a model like this:
type Artist struct {
ID int
Name string
Album []int
}
type Album struct {
ID int
Name string
Song []int
}
type Song struct {
ID int
Name string
}
So as can be seen, Genji has the ability to do Slice values. It some cases, this can obviate the need for junction tables, and in turn JOIN as well. Again this might not be the case for all users, but it was for me.
@89z Great work
I would like to extend your example to add graphQL: https://github.com/UHN/ggql/tree/master/examples/reflection#define-the-api. This is a new graphql lib that is much faster than others.
With Genji backing it , it will be highly efficient.
Maybe we can make it a genji example too that people can use as a reference.
Just out of curiosity, would interleaving tables as seen in Cloud Spanner be a suitable alternative to joins? While it's not a drop-in replacement for a traditional table join, it could provide additional performance in the form of data locality.
In the context of Cloud Spanner, interleaved tables also allow for better performance in a distributed context. I've seen several requests here for Raft or other HA implementations and while it sounds like that's a "maybe down the road" feature, interleaved tables could solve the future problem of joins between tables distributed across nodes by co-locating tables with each other on the same node.
Got links on this approach by any chance ?
On Sat 8. May 2021 at 21:49, Clark McCauley @.***> wrote:
Just out of curiosity, would interleaving tables https://cloud.google.com/spanner/docs/schema-and-data-model#parent-child_table_relationships as seen in Cloud Spanner be a suitable alternative to joins? While it's not a drop-in replacement for a traditional table join, it could provide additional performance in the form of data locality.
In the context of Cloud Spanner, interleaved tables also allow for better performance in a distributed context. I've seen several requests here for Raft or other HA implementations and while it sounds like that's a "maybe down the road" feature, interleaved tables could solve the future problem of joins between tables distributed across nodes by co-locating tables with each other on the same node.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/genjidb/genji/issues/3#issuecomment-835488249, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMVPLFDHAXCVLGVGKV2UPITTMWIVJANCNFSM4HI6M3AQ .
-- Mobile: +4915231894553 Telegram: http://t.me/gedw99
Besides the link I cited in my OP, the following links may be helpful.
- https://cloud.google.com/spanner/docs/whitepapers/optimizing-schema-design#table_layout
- See 2.3 Data Model https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf
Sadly, I haven't found much by way of implementation details.
Instead of joins, graph model can be implemented, its faster. Take a look at https://surrealdb.com/docs/surrealql/statements/relate @tdakkota @clarkmcc