prisma1
prisma1 copied to clipboard
Improving performance with Indexes
Please see latest spec in: https://github.com/prisma/prisma/issues/3405
In order to be able to optimise performance for the actual workload, it should be possible for the development team to specify the exact indexes they need.
Indexes currently being created
primary key
All models are backed by a single database table with a id
column containing the node id. This column is indexed as the primary key.
unique fields
Uniqueness is guaranteed by creating a unique index on the column.
Example
type User @model {
id: ID! @isUnique
name: String @isUnique
city: String
}
This model generates a table with two indexes:
- primaryKey(id)
- unique(name)
relations
All relations are implemented with a intermediate relation table:
type User @model {
id: ID! @isUnique
name: String
posts: [Post!]! @relation(name: "Posts")
}
type Post @model {
id: ID! @isUnique
title: String
creator: user! @relation(name: "Posts")
}
generates three database tables:
User
id | name |
---|---|
string | string |
- primaryKey(id)
Post
id | title |
---|---|
string | string |
- primaryKey(id)
Posts
id | A | B |
---|---|---|
string | string | string |
- primaryKey(id)
- unique(id)
- index(A)
- index(B)
- unique(A,B)
Simplifying indexes on relation tables
Remove the id coulmn
There is no need for a dedicated id column on relation tables
Instead, use the compound index A,B as the primary key to guarantee that a node can only be inserted once into the relation
Two indexes is enough to provide fast traversal
To provide fast traversal in both directions we need two indexes: one on A and one on B.
As we already have a compound index with A as prefix, we just need an index on B
We end up with this structure:
A | B |
---|---|
string | string |
- primaryKey(A,B)
- index(B)
Complexity Analysis
Single Model
non unique field
{allUsers(filter:{age: 25})}
Currently: Tablescan
Best Case: index lookup | index(age)
{allUsers(filter:{age: 25, orderBy: MONEY_ASC})}
Currently: Tablescan
Best Case: index lookup | index(age, money)
More Realistic: index lookup + filesort | index(age)
two fields
{allUsers(filter:{age: 25, name: "Karl"})}
Currently: Tablescan
Best Case: compound index | index(age, name)
More Realistic: Index intersection or single key index | index(age), index(name)
{allUsers(filter:{age: 25, name: "Karl"}, orderBy: MONEY_ASC)}
Currently: Tablescan
Best Case: compound index | index(age, name, money)
More Realistic: Index intersection or single key index + filesort | index(age), index(name)
range query
{allUsers(filter:{age_gt: 25})}
Currently: Tablescan
Best Case: index seek | index(age)
{allUsers(filter:{age_gt: 25}, orderBy: AGE_ASC)}
Currently: Tablescan
Best Case: index seek | index(age)
{allUsers(filter:{age_lt: 25}, orderBy: AGE_ASC)}
Currently: Tablescan
Best Case: index seek + filesort | index(age) (range filter and orderby not compatible)
{allUsers(filter:{age_gt: 25}, orderBy: MONEY_ASC)}
Currently: Tablescan
Best Case: index seek + filesort | index(age)
range query and equality
{allUsers(filter:{age_gt: 25, name: "Karl"})}
Currently: Tablescan
Best Case: compound index seek | index(name, age)
More Realistic: index seek + table lookup | index(name)
[Sorting same as range query]
range query on two fields
{allUsers(filter:{age_gt: 25, money_gt: 10000})}
Currently: Tablescan
Best Case: compound index seek | index(name, money)
More Realistic: index seek + table lookup | index(money)
{allUsers(filter:{age_gt: 25, money_gt: 10000}, orderBy: MONEY_ASC)}
Currently: Tablescan
Best Case: compound index seek | index(name, money)
More Realistic: index seek + table lookup + filesort | index(money)
{allUsers(filter:{age_gt: 25, money_gt: 10000}, orderBy: FAMILYMEMBERS_ASC)}
Currently: Tablescan
Best Case: compound index seek | index(name, money, familyMembers)
More Realistic: index seek + table lookup + filesort | index(money)
Multiple Models
simple relation
{User(id:""){posts{title}}}
Currently: 2 x index lookup | index(A), index(id)
Best Case: 2 x index lookup | index(A), index(id)
{User(id:""){posts(orderBy: LIKES_ASC){title}}}
Currently: 2 x index lookup + filesort | index(A), index(id)
Best Case: 2 x index lookup | mirror(posts.likes), index(A, likes), index(id)
relation with single equality
{User(id:""){posts(filter:{title: "nice title"}){title}}}
Currently: index lookup + overfetching index lookup | index(A), index(id)
Best Case: 2 x index lookup | mirror(posts.title) index(A, title), index(id)
{User(id:""){posts(filter:{title: "nice title"}, orderBy: LIKES_ASC){title}}}
Currently: index lookup + overfetching index lookup + filesort | index(A), index(id)
Best Case: 2 x index lookup | mirror(posts.likes), mirror(posts.title), index(A, title, likes), index(id)
More Realistic: index lookup + overfetching index lookup | mirror(posts.likes), index(A, likes), index(id)
relation with range query
{User(id:""){posts(filter:{likes_gt: 3}){title}}}
Currently: index lookup + overfetching index lookup | index(A), index(id)
Best Case: index seek + index lookup | mirror(posts.likes) index(A, likes), index(id)
{User(id:""){posts(filter:{likes_gt: 3}, orderBy; AGE_ASC){title}}}
Currently: index lookup + overfetching index lookup + filesort | index(A), index(id)
Best Case: index lookup + overfetching index lookup | mirror(posts.age) index(A, age), index(id)
{User(id:""){posts(filter:{likes_gt: 3}, orderBy; LIKES_ASC){title}}}
Currently: index lookup + overfetching index lookup + filesort | index(A), index(id)
Best Case: index lookup + index seek | mirror(posts.likes) index(A, likes), index(id)
Mirror fields to relation table
Many of the Best Case and Realistic Case scenarios above depend on one or more columns from a model table to be mirrored in the relation table. Mirroring a column into the relation table enables us to create a compound index across relations. In the relation example above, we might introduce a mirrored field like this:
type User @model {
id: ID! @isUnique
name: String
posts: [Post!]! @relation(name: "Posts")
}
type Post @model {
id: ID! @isUnique
title: String # mirror this field into Posts relation
creator: User! @relation(name: "Posts")
}
generates three database tables:
User
id | name |
---|---|
string | string |
- primaryKey(id)
Post
id | title |
---|---|
string | string |
- primaryKey(id)
Posts
id | A | B | B_title |
---|---|---|---|
string | string | string | string |
- primaryKey(A,B)
- index(B)
- index(A, B_title)
The new compound index allows us to speed up queries like this:
{allUsers(filter:{posts_any: {title: "A Title"}}) {...} }
Mirrored fields must be kept in sync by the graphcool backend
Specifying indexes in types.graphql
Design Goals
- Should be as intuitive as possible
- Must support multiple indexes covering the same field (a directive can only be applied once to a field)
- Must support indexes across relations (including fields on just one side and both sides)
Single Model
one field
type User @model {
id: ID! @isUnique
name: String @index
}
# or specify sort order
type User @model {
id: ID! @isUnique
name: String @index(sort: ASC)
}
multiple fields
type User @model {
id: ID! @isUnique
name: String @index(groups: [{name: "combined_name_and_age", position: 0}])
age: String @index(group: [{name: "combined_name_and_age", position: 1}])
}
Across relations
one field
type User @model {
id: ID! @isUnique
name: String @index(groups: [{relation: "Posts"}])
posts: [Post!]! @relation(name: "Posts")
}
type Post @model {
id: ID! @isUnique
title: String
creator: User! @relation(name: "Posts")
}
Improves performance of queries like:
{allPosts(filter:{creator:{name: "John"}})}
multiple fields
type User @model {
id: ID! @isUnique
name: String @index(groups: [{name: "combined_name_and_title", position: 0, relation: "Posts"}])
posts: [Post!]! @relation(name: "Posts")
}
type Post @model {
id: ID! @isUnique
title: String @index(groups: [{name: "combined_name_and_title", position: 1, relation: "Posts"}])
creator: User! @relation(name: "Posts")
}
Improves performance of queries like:
{allPosts(filter:{title: "Great Title", creator:{name: "John"}})}
Unique Index
To specify a unique constraint covering multiple fields, simply use the @unique
directive instead of @index
:
type User @model {
id: ID! @isUnique
name: String @unique(groups: [{name: "combined_name_and_age", position: 0}])
age: String @unique(group: [{name: "combined_name_and_age", position: 1}])
}
@sorenbs An excellent and thorough analysis!
Before I go any further, you might have a typo in the definition of the 'Post table':
type Post @model {
id: ID! @isUnique
title: String
creator: Post! @relation(name: "Posts") // <-- should read 'creator: User! @relation(name: "Posts")?
}
I think the syntax also needs to cover simple non-join tables as per follows:
type Country @model {
id: ID!
countryName: String! @isUnique
regions: [Region!] @relation(name: "CountryRegions")
// Regions are States in the US, but 'Provinces' in other countries
}
type Region @model {
id: ID!
country: Country! @IsUnique(groups: [{name: "UK_CountryRegions", position: 0}])
stateName: String! @isUnique(groups: [{name: "UK_CountryRegions", position: 1}]
} // UK_ here means 'Unique Key'
Also, I do not think that generated M2M Join tables are a good idea. One thing I learned the hard way over the years, even after having been given this piece of advice, was that if a M2M join table is created solely for the purpose of joining, without any additional info in it, then the database analysis is probably missing some important feature, and needs to be looked at again. Some M2M join tables also sometimes join 3 or 4 tables.
In principle, a M2M join table needs to include some other qualifying columns, above and beyond the standard 'CreatedAt, CreatedByUser' etc. For eg:
Real-Estate Agents (who often work in pairs, and can work with agents from another Agency, under a deal arrangement), is mapped to a specific property, sometimes as the 'Lead' agent, sometimes as the 'Assistant'. So the M2M join table might look like:
type MapAgent2Property @model {
id: ID! @isUnique
agentID: ID!
propertyID: ID!
agentRole: RoleEnum! // 'Primary', 'Assistant', 'CoAgent' etc.
dealAgreementID: ID // The details of who gets what % of the sale
}
So what I am saying is that generated M2M tables are not really feasible. The DB analyst should be able to define the table properly, and define the necessary (multi-column) Unique keys to ensure consistency.
So, your proposed syntax is good, but does not need to be as complex as you are making it. I really like the 'sort_order' option as well.
Good job on the thorough analysis!
BTW, what MarkDown tag did you use to get the cool Markup on your types?
` ` ` graphql
awesome, thanks! @kbrandwijk
Thanks @Trellian - some good thoughts!
First - I have corrected the typo.
About the M2M join table: A core design goal of Graphcool is to enable pain-free schema migrations preserving existing data.
One of the design decisions this has led to is that we always use a M2M-style join table when creating relationships. This way, transforming a 1-m relationship into a M2M is literally a matter of changing the type of a single field to a list. As we don't have to move data around, we can perform this operation quickly without downtime and without loosing data. This also allows us to introduce Kims proposal https://github.com/graphcool/framework/issues/746 in a way that does not require changing the underlying datastructure.
We should carefully consider if this level of flexibility is really required and if the tradeoff in performance is worth it, so thank you for bringing it up.
An alternative could be that relations by default never use a relation table, and M2M relations are not supported. If a relation table is required (either to add extra fields on the relation or to create a M2M relation), then the @edge
type proposed by Kim is required.
Do you think that would be a better approach? My main concern is that this introduces extra complexity that makes it more difficult to get started with Graphcool.
@sorenbs I like the @edge syntax. It's effectively defining a join table.
But I don't want to get off topic here, though. We're talking about having multi-column unique indexes/keys. Once you have those in place, a whole new world opens up. Without them, one can't produce a real-world database, except using the BeforeHook
approach, which is effectively throwing away the support for such in the backend RDBMS, which every good RDBMS is exceptionally good at. Field Constraints, Table Constraints, Relation Constraints all belong in the RDBMS. That's what the RDBMS is there for, after all.
Can I suggest that we open a new topic for discussing the impact of Join tables? Please also note, that a Join table is just like any other, no need to abstract it out, really. If there is extra information on the 'edge' with a join table, then that info is useful. Even for the simplest case of just order-by.
@sorenbs I don't see https://github.com/graphcool/framework/issues/746 marked as 1.0 😄
I agree with @Trellian that multi-column uniqueness constraints are far more important than having a join table for being able to easily convert a one to many 1-M relationship into a many to many M2M relationship. I need many to many relationships occasionally, so don't want to get rid of that ability and realize that would still need a join table, but IMO it's going to be pretty rare to want to transition between those join situations, and I'd be fine with that being manual and cumbersome (need to create new tables, migrate data, rename) if it meant I got multi-column constraints (and indexes) which I need all the time.
The proposal mentions
To specify a unique constraint covering multiple fields, simply use the @unique directive instead of @index:
type User @model {
id: ID! @isUnique
name: String @unique(groups: [{name: "combined_name_and_age", position: 0}])
age: String @unique(group: [{name: "combined_name_and_age", position: 1}])
}
- Would this also allow for unique constraints spanning relations (also see #65), presumably with this syntax:
type User @model {
id: ID! @unique(groups: [{name: "post_title_per_user", position: 0}])
name: String
age: String
}
type Post @model {
id: ID! @isUnique
title: String! @unique(groups: [{name: "post_title_per_user", position: 1}])
}
- Is there a difference between
@isUnique
and@unique
?
@marktani in answer to #1
,
The necessary syntax is simpler than in #1
above, I think it should look like this:
type User @model {
id: ID! @Unique
name: String
age: String
Posts: [Post!] @relation(name: "user_posts")
}
type Post @model {
id: ID! @Unique
author: User! @relation(name: "user_posts) @unique(groups: [{name: "post_title_per_user", position: 0}])
title: String! @unique(groups: [{name: "post_title_per_user", position: 1}])
}
or maybe this would be simpler and easier to implement?:
type Post @model {
id: ID! @Unique
author: User! @relation(name: "user_posts)
title: String!
@unique({name: "UK_user_posts", columns: {"author", "title"})
// or even simpler, no need for a unique key name, most RBMS don't need them, or they `autogenerate`:
// @unique(columns: {"author", "title"})
}
Unique keys are Table constraints, and are never allowed to span tables directly in RDBMS theory.
For #2
, there is no difference between @isUnique
and @Unique
How will multi-field unique constraints be reflected in where
input objects?
Example schema:
type Post {
id: ID! @unique
title: String @unique(groups: [{name: "combined_title_and_slug", position: 0}])
slug: String @unique(group: [{name: "combined_title_and_slug", position: 1}])
}
Query by title
:
query {
post(where: {
title: "My biggest adventure"
}) {
id
title
published
}
}
Query by slug:
query {
post(where: {
id: "my-biggest-adventure"
}) {
id
title
published
}
}
Is there a way to fetch a post by a unique title
+ slug
combination? I thought of this:
query {
post(where: {
combined_title_and_slug: {
slug: "my-biggest-adventure"
title: "My biggest adventure"
}
}) {
id
title
published
}
}
I'd just like to mention that I'm really hurting for a compound primary key at the moment to provide an upsert on these fields
Yes, echoing the need for a multi-column index to enforce uniqueness.
Right now we are manually enforcing the combined uniqueness by performing a "where" at all places a creation can take place.
what about @fulltext()
index?
+1 for multi-column index and unique indexes!
Suggestion looks good!
Is anything happening here? How can we help?
About multi column unique indexes, which i'm in dire need of at the moment: Why have 2 decorators @isUnique
and @unique
- could that possibly be done with a single one?
Generally agree Foreign Keys, multi column index, and constraints are super important in a robust for production RBDMS. I dont understand the design decision for defaulting to join tables, seems optimizing for the wrong thing, 80-20 rule, relational dbs are ALL about relationships, native on those entities, FKs are important, if u need an entire new table every time u need a simple FK that is a huge overhead, huge performance impact adding joins to SQL, and just generally hard to work with engineering wise, especially coming into a project with a big graph/schema.
Adding to @bjm88 I really don't understand, how this feature could be not in the list of top-priorities. Running a production DB without keys and constraints and any kind of load more than nothing is absurd. And without compound indexes in lot cases you just can't guarantee data consistency.
@terion-name That's the argument I have been trying to make for months, and have been shot down repeatedly. It is such a fundamental feature that I seriously wonder if the guys behind graphcool have any real-world experience at all. I'm not trying to be nasty here. It's just an observation. I wouldn't even consider releasing a product like this without those basic features. It will be dead in the water on the first even slightly non-trivial database. You wouldn't even be able to get the basic old-favourite 'video-store' DB to work properly. @bjm88 I agree totally with you too. The solution as it stands is totally unworkable in the real world.
I started a medium-sized project with Prisma ... just after 3 days of inputting data into it, I started to notice that it chocked badly, because? ... no indexes. And there was a weird Apollo client bug which rendered everything just inconsistent. I thus gave-up and simply moved to another stack instead.
Not that I'm too impatient, but I have noticed in the past that features you ask don't usually get implement as eagerly as you'll want them to be (in the days of Graphcool and now), even if it seems important. Sure, this is open-source, but I don't have time to learn Scala and I don't see the need to. And there's the internal latency with Prisma, which seems to be an additional 100ms.
Well, all those things adding-up together: I had to explore my options.
@marktani please, this feature is absolutely crucial for production, a first-tear. no sense in adding connectors or any features if it just can't utilize db performance.
Switching to other solutions because of the lack of this feature.
Hi Prisma team, I think you are seeing that building an ORM it is crucial to get relationships and performance right working with databases. This project has such potential, I really hope you consider doing organic foreign keys on tables themselves and taking index setup and management seriously. Have you considered using GitCoin to have people be able to pay for features to help support them. I suspect this is actually not hard to implement, many other ORMs do it, but I simply cannot use this library without these basic design principles for working with a db, would be happy to support with $ if you enable it.
Adding in my vote for this ~~feature~~ requirement. Composite foreign keys, indexes and the implementation of proper RDBMS architecture are a must for this project.
For those that run into performance problems using Postgres, check out Hasura as an option. All things considered, i like prisma better - but when performance is important, this unfortunately is on the slower side atm.
Hey all,
thanks for sharing your concerns with us! :pray: First of all i would like to say that performance is an important topic for us and internally we are discussing this all the time. We are committed to improving performance as a continuous effort.
However indexes are a just one facet of performance. During the last weeks we created an extensive benchmarking suite which we used to drive our decisions on which areas must be improved first performance wise. In our tests just adding indexes did not help performance as much as we would like it to. Instead we identified our SQL queries and application code as the biggest problems right now.
Therefore we spent a lot of time during the last weeks to refactor our SQL queries to be more performant. In addition we profiled the Prisma application code a lot and were able to implement significant improvements for CPU and memory consumption. The first big batch of those improvements is landing in our release 1.14
which went into the beta stage this week. 🚀
If you are eager to give it a spin just use the tag/version 1.14-beta
. We have some more improvements in the pipeline for the subsequent releases as well. When we are through with all of those changes we have planned, we will turn to indexes again and i am confident that they will then bring additional improvements.
And there is one last thing around indexes: If you think that indexes are absolutely crucial you always have the possibility to just connect to your database and create them manually. I agree that this is not the best experience yet, but we really want to get indexes right before we ship anything half baked to our users.
PS: In case you hit any performance problems with 1.14
or later, we would really like to hear about it. Just contact me on our public Slack (@marcus
)so we can learn about your concrete problems and can come up with solutions in future releases. And i can probably give you a hint or two how your GraphQL query could be optimised 😃
@mavilein thank you for the detailed update. Can you speak to the compound primary key aspect of the discussion, please?
I.e. the ability to apply @unique to a combination of fields for a relation.
This aspect is the more limiting factor in terms of usability.
Re:https://github.com/prismagraphql/prisma/issues/171
@mavilein performant queries is a good thing, but saying that indexes are not efficient — is not correct. Now we simply can't make compound indexes (like unique(slug, lang)
) or fulltext indexes. We just can't optimize any selects beyond selections by ID (price range selects, date period selects and many other that are used in real-life applications). What is the problem? Why indexes is a problem for almost a year? It is far more easy thing than optimizing query builder and it is absolutely crucial and should be one of first features. And manual indexes just don't fit in current migrating model (that also needs to be changed, but it is a separate topic). In fact in any migrating model. It's just not an option for development cycle
@mavilein, if I create the unique constraints manually as you suggest, what are the chances that I will have problems later with e.g. data migrations? Thanks
@sorenbs
We should carefully consider if this level of flexibility is really required and if the tradeoff in performance is worth it, so thank you for bringing it up.
An alternative could be that relations by default never use a relation table, and M2M relations are not supported. If a relation table is required (either to add extra fields on the relation or to create a M2M relation), then the @edge type proposed by Kim is required.
Do you think that would be a better approach? My main concern is that this introduces extra complexity that makes it more difficult to get started with Graphcool.
Did you guys ever have a meeting(s) about this and pick a direction? These are very good questions that will dictate a ton of code you guys write, as well as how successful Prisma is.
fwiw, the "M2M everything!" approach you're currently drifting along with was the reason we jumped ship.
My main concern is that this introduces extra complexity that makes it more difficult to get started with Graphcool.
The complexity would be worth it. (I'd trade the complexity of developing/supporting 50 connectors to just supporting Postgres.)