chai icon indicating copy to clipboard operation
chai copied to clipboard

Add Joins

Open asdine opened this issue 5 years ago • 6 comments

asdine avatar Apr 28 '19 12:04 asdine

Abstract

This proposal is mostly based on #297

Proposal: JOINs

For now, Genji have no ways to select several tables in one query. JOINs is the most commonly used way in relation databases.

Changes

  • Add SQL JOIN operation parser.

  • Add a Join operation to the planner and nodes which implementing it. Joining algorithms implemented by #297:

    • Nested loops join Basic algorithm with O(n*m) complexity in worst case. Implementation: https://github.com/tdakkota/genji/blob/feature/inner-join/sql/planner/join.go#L65
    • Hash join Algorithm with O(n+m) complexity in worst case, but requires a Equi-join predicat in join clause. Implementation: https://github.com/tdakkota/genji/blob/feature/inner-join/sql/planner/join.go#L140

    Joining algorithms which not implemented by #297:

    • Sort-merge join Algorithm with O(n+m) complexity in worst case, but requires sorted document stream, so with sorting complexity becomes quasilinear O(nlogn+mlogm) Best in-use with index.

    A good note about implementing joins from CrateDB blog: https://crate.io/a/joins-faster-part-one/

  • Add a optimizator rule to determine which algorithm should be used For now, #297 adds a optimizer rule which replaces a Nested loops join to Hash join if clause is a equi-join.

  • Add a special joined document type which allow to access fields from all joined tables. And there a problem, Genji is not a relational database, document fields can be undeclared in table definition. So any relative field path in query can be ambigous.

    Some ideas to solve:

    • Force to use absoulte path to field when query contains JOIN clause, i.e.table_name.path_to_field
    • Disallow use fields which are not defined in table when query contains JOIN

tdakkota avatar Nov 13 '20 19:11 tdakkota

This probably wont work for all uses, but in my situation, I was coming from SQLite and was using NATURAL JOIN with Junction tables. Using Genji I was able to ditch that paradigm using a model like this:

type Artist struct {
   ID int
   Name string
   Album []int
}

type Album struct {
   ID int
   Name string
   Song []int
}

type Song struct {
   ID int
   Name string
}

So as can be seen, Genji has the ability to do Slice values. It some cases, this can obviate the need for junction tables, and in turn JOIN as well. Again this might not be the case for all users, but it was for me.

89z avatar Jan 01 '21 15:01 89z

@89z Great work

I would like to extend your example to add graphQL: https://github.com/UHN/ggql/tree/master/examples/reflection#define-the-api. This is a new graphql lib that is much faster than others.

With Genji backing it , it will be highly efficient.

Maybe we can make it a genji example too that people can use as a reference.

gedw99 avatar Jan 02 '21 10:01 gedw99

Just out of curiosity, would interleaving tables as seen in Cloud Spanner be a suitable alternative to joins? While it's not a drop-in replacement for a traditional table join, it could provide additional performance in the form of data locality.

In the context of Cloud Spanner, interleaved tables also allow for better performance in a distributed context. I've seen several requests here for Raft or other HA implementations and while it sounds like that's a "maybe down the road" feature, interleaved tables could solve the future problem of joins between tables distributed across nodes by co-locating tables with each other on the same node.

clarkmcc avatar May 08 '21 19:05 clarkmcc

Got links on this approach by any chance ?

On Sat 8. May 2021 at 21:49, Clark McCauley @.***> wrote:

Just out of curiosity, would interleaving tables https://cloud.google.com/spanner/docs/schema-and-data-model#parent-child_table_relationships as seen in Cloud Spanner be a suitable alternative to joins? While it's not a drop-in replacement for a traditional table join, it could provide additional performance in the form of data locality.

In the context of Cloud Spanner, interleaved tables also allow for better performance in a distributed context. I've seen several requests here for Raft or other HA implementations and while it sounds like that's a "maybe down the road" feature, interleaved tables could solve the future problem of joins between tables distributed across nodes by co-locating tables with each other on the same node.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/genjidb/genji/issues/3#issuecomment-835488249, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMVPLFDHAXCVLGVGKV2UPITTMWIVJANCNFSM4HI6M3AQ .

-- Mobile: +4915231894553 Telegram: http://t.me/gedw99

gedw99 avatar May 08 '21 19:05 gedw99

Besides the link I cited in my OP, the following links may be helpful.

  • https://cloud.google.com/spanner/docs/whitepapers/optimizing-schema-design#table_layout
  • See 2.3 Data Model https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf

Sadly, I haven't found much by way of implementation details.

clarkmcc avatar May 08 '21 20:05 clarkmcc

Instead of joins, graph model can be implemented, its faster. Take a look at https://surrealdb.com/docs/surrealql/statements/relate @tdakkota @clarkmcc

tomasweigenast avatar Dec 07 '22 10:12 tomasweigenast