PynamoDB
PynamoDB copied to clipboard
Multiple Models on top of One DynamoDB Table
Hi All,
It is being suggested by DynamoDB Best practices that "most well designed applications require only one table". Does anyone has tried to fit all relational entities in one DynamoDB table and access them through PynamoDB.
Please share the experience, if anyone tried to follow this best practice.
Thanks, Akash
I've done this using PynamoDB, but you have to be very conscious of how you manage your model's Meta classes, hash keys etc (if you implement inheritance).
The main thing to be careful of is how you manage your composite keys (assuming you're using composite keys for either your hash or range keys). I've found that using a custom Attribute for hash/range keys helps a lot and this strategy can enable some really powerful schemas on top of a single DynamoDB table. Especially now that Transactions are a thing.
Note too that you will likely have to over-ride the query/get class methods to ensure your composite keys are serialised (for example: prefixing a static value to the dynamic portion of the composite key) before calling the underlying query/get class methods.
I must admit that PynamoDB doesn't ~make this very easy~ encourage this strategy, but it is possible as long as you're careful.
You might be able to use a UnicodeDelimitedTupleAttribute
to somehow formalize that "composite key" approach:
https://github.com/lyft/pynamodb-attributes/blob/master/pynamodb_attributes/unicode_delimited_tuple.py
It's a fair thing to say that if your datastore is schema-less, thought you might have schema in the application layer, you don't need to involve your datastore in that by naming your tables after your schemas (or having a table per schema even). I'll go ahead and say that's not how we've been using DynamoDB here at Lyft, but I'll be curious to see examples of this design approach being applied on some canonical app example, e.g. design of a blogging system with users, posts, analytics, etc.
Thanks @ikonst - the UnicodeDelimitedTupleAttribute
is definitely a good way to manage composite keys; it's very similar to the approach I've taken on my projects.
On the back of the custom in-application work I've been doing, I've started working (very casually) on a library which uses PynamoDB's models
and some Attribute
classes, but encourages the concept of modelling your data as DynamoDB partitions rather than tables (i.e. one table, many models).
The potential danger of this approach is hot partitions on DynamoDB. But as long as your data either a) will fit inside one DynamoDB partition, or b) can use a significant spread of hash (partition) key values, then it should be easy to overcome.
Your blogging system example, using the concept of "models-as-partitions" (one table, many models), could yield data in DynamoDB which looks something like this:
hash_key | range_key | created_dt | content | username | value |
---|---|---|---|---|---|
posts | post1 | 1569038270 | Some post | ||
posts | post2 | 1569038270 | Another post | ||
posts | post3 | 1569038270 | Third post | ||
users | user1 | 1569038270 | Bob | ||
users | user2 | 1569038270 | Roger | ||
users | user3 | 1569038270 | Tim | ||
counters | post1 | 1569038270 | 23 | ||
counters | post2 | 1569038270 | 44 | ||
counters | post3 | 1569038270 | 64 |
Your python models could then define a single base class for each logical partition. i.e. A Posts
model, Users
model etc.
However, given the small number of hash_key values, you'd need to make sure all your data would fit within one DynamoDB partition (per a)
above).
Note though that time-based analytics would not be a good use-case for a single-table pattern (per DynamoDB's best practices docs).
An (somewhat contrived) example of a data schema which I feel avoids hot partitions is as follows:
hash_key | range_key | created_dt | username | full_name | content | owner_id | post_id | primary_email |
---|---|---|---|---|---|---|---|---|
ff1c09eb-2e67-4773-a8ae-6d9e48055034#user | meta | 1569038771 | bob1 | Bob Richards | ff1c09eb-2e67-4773-a8ae-6d9e48055034 | |||
ff1c09eb-2e67-4773-a8ae-6d9e48055034#user | email_data | 1569038771 | ff1c09eb-2e67-4773-a8ae-6d9e48055034 | [email protected] | ||||
ff1c09eb-2e67-4773-a8ae-6d9e48055034#user | comments#1569038771 | 1569038771 | This is bob's comment | ff1c09eb-2e67-4773-a8ae-6d9e48055034 | 841f97a1-2004-4ee5-907d-cc24f928f8cb | |||
17a4b8d2-79fa-4564-ab2c-bdc9dd4ce4b5#user | meta | 1569038771 | tim2 | Tim Nice | 17a4b8d2-79fa-4564-ab2c-bdc9dd4ce4b5 | |||
35c208e9-6057-40c8-b062-1f722f07d5f6#user | meta | 1569038771 | roger3 | Roger Dodger | 35c208e9-6057-40c8-b062-1f722f07d5f6 | |||
841f97a1-2004-4ee5-907d-cc24f928f8cb#post | posts | 1569039058 | This is a post body | 841f97a1-2004-4ee5-907d-cc24f928f8cb |
You could add GSIs for owner_id
and post_id
to allow for retrieval of all data belonging to a user or post.
I'm working on a project that is following the linked best practise for data storage. As it stands, there doesn't seem to be any documented support for this access and storage pattern. Is there any goal to eventually include this as part of the documentation?
@grvhi Do you have any further reading on your approach?
In the meantime, I'm going to make a small attempt at this and report back on my findings for your proposed approach using UnicodeDelimitedTupleAttribute
.
@curlywurlycraig - I don't have any documentation or currently-available reference implementation. However, I will say that the UnicodeDelimitedTupleAttribute
is definitely a good starting point, based on what I've come across going down this road, so far.
The library I'm working on (which I refer to briefly above) has implemented a class I've called Partition
which uses PynamoDB's MetaModel
and AttribubteContainer
to offer similar functionality as PynamoDB's Model
, but with some workarounds for inheriting the Meta
class and correctly serialising composite keys. However, the main focus of the library is to tightly couple graphql-core-next
, a "query planner" and results caching. It's very (very) early days at this stage, but I could share it if you think it will be useful.
That could be pretty helpful for me. I'm currently having a crack at creating a very minimal similar thing for our use cases, so having a reference would be helpful if you're interested in sharing the source.
@curlywurlycraig - here's a gist: https://gist.github.com/grvhi/78889f32b3701c421ef30e72aebc7f69
I've tried to remove anything which related specifically to the integration of graphql-core-next
; hopefully I've neither removed too much, nor too little! Happy to address comments/questions (if you have any) on the gist itself.
Note that there's still a lot of work to be done here and I'm very much open to the idea that I've gone down the wrong path! Would be interested to hear your thoughts.
@grvhi Thank you! I'll take a look at this. I have my own piece of code at the moment and will compare. Your approach looks more comprehensive than mine.
How have you guys progressed in using pynamo for single-table design? I'm working on a new project and after reading lots of aws documentation, blogs, etc, it's clear that they promote 1-app/1-table. It's not clear how to implement this in pynamodb or whether native support will eventually land?
Seems like even AWS's own internal team didn't implement adjacency lists when doing this for AppSync.
@ricky-sb I've had an aws engineer tell me single table makes m2m hard at scale, but publicly they are recommending single table design:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html#bp-general-nosql-design-concepts
You should maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table.
I've built a Python library called ~Dokklib-DB~ for the DynamoDB single table pattern.
~Dokklib-DB~ takes a different philosophy from PynamoDB in that follows the DynamoDB API closely, so if you've have used Boto3 before, it will feel familiar.
Features:
- Simple, Pythonic query interface on top of Boto3. No more nested dict literals!
- Type safety for primary keys and indices (for documentation and data integrity).
- Easy error handling.
- Full type hint & unit test coverage + integration testing.
~Docs: https://dokklib.com/libs/db/~
Update 2022-03-11: The project is now archived, but the source is still available at https://github.com/dokklib/dokklib-db
@abiro this looks really nice! great work! hm now I want to give it a try... using multi-table atm, def bookmarking, thank you
We're working on a fork(ish) of PynamoDB that's built for single table design. Same interface, but works with multiple models on a single table: https://github.com/3mcloud/falcano
If the ability to query multiple polymorphic models at the same time was added, then I think the single-table use case would be more-or-less covered. E.g:
class ParentModel(Model):
class Meta:
table_name = 'polymorphic_table'
id = UnicodeAttribute(hash_key=True)
sort = UnicodeAttribute(range_key=True)
cls = DiscriminatorAttribute()
class FooModel(ParentModel, discriminator='Foo'):
foo = UnicodeAttribute()
class BarModel(ParentModel, discriminator='Bar'):
bar = UnicodeAttribute()
items = ParentModel.query('some id')
# Items contains instances of both FooModel, and BarModel depending on the discriminator property
All models on the same table would be sub-classes of a single parent model.
#1004 with this issue present, polymorphic models are out of the choices for me right now 😢
It would be really nice to have support for Single table design for DynamoDB at hash level, without using additional columns (like typedorm) . I believe this is how most people are using it, at least most of the examples I see and that's what we are using. I'm guessing pynamodb would be even more useful in this cases to deal with querying using single table design and it would gain wider adoption. :)
@bafonso what do you mean by "at hash level"?
@bafonso what do you mean by "at hash level"?
Sorry my terminology is not very accurate. I basically meant support for prefixing keys with type, ie USER#
If the ability to query multiple polymorphic models at the same time was added, then I think the single-table use case would be more-or-less covered. E.g:
class ParentModel(Model): class Meta: table_name = 'polymorphic_table' id = UnicodeAttribute(hash_key=True) sort = UnicodeAttribute(range_key=True) cls = DiscriminatorAttribute() class FooModel(ParentModel, discriminator='Foo'): foo = UnicodeAttribute() class BarModel(ParentModel, discriminator='Bar'): bar = UnicodeAttribute() items = ParentModel.query('some id') # Items contains instances of both FooModel, and BarModel depending on the discriminator property
All models on the same table would be sub-classes of a single parent model.
Agree completely with this approach.