PynamoDB icon indicating copy to clipboard operation
PynamoDB copied to clipboard

Multiple Models on top of One DynamoDB Table

Open akashg90 opened this issue 5 years ago • 20 comments

Hi All,

It is being suggested by DynamoDB Best practices that "most well designed applications require only one table". Does anyone has tried to fit all relational entities in one DynamoDB table and access them through PynamoDB.

Please share the experience, if anyone tried to follow this best practice.

Thanks, Akash

akashg90 avatar Sep 01 '19 09:09 akashg90

I've done this using PynamoDB, but you have to be very conscious of how you manage your model's Meta classes, hash keys etc (if you implement inheritance).

The main thing to be careful of is how you manage your composite keys (assuming you're using composite keys for either your hash or range keys). I've found that using a custom Attribute for hash/range keys helps a lot and this strategy can enable some really powerful schemas on top of a single DynamoDB table. Especially now that Transactions are a thing.

Note too that you will likely have to over-ride the query/get class methods to ensure your composite keys are serialised (for example: prefixing a static value to the dynamic portion of the composite key) before calling the underlying query/get class methods.

I must admit that PynamoDB doesn't ~make this very easy~ encourage this strategy, but it is possible as long as you're careful.

grvhi avatar Sep 20 '19 07:09 grvhi

You might be able to use a UnicodeDelimitedTupleAttribute to somehow formalize that "composite key" approach: https://github.com/lyft/pynamodb-attributes/blob/master/pynamodb_attributes/unicode_delimited_tuple.py

It's a fair thing to say that if your datastore is schema-less, thought you might have schema in the application layer, you don't need to involve your datastore in that by naming your tables after your schemas (or having a table per schema even). I'll go ahead and say that's not how we've been using DynamoDB here at Lyft, but I'll be curious to see examples of this design approach being applied on some canonical app example, e.g. design of a blogging system with users, posts, analytics, etc.

ikonst avatar Sep 21 '19 02:09 ikonst

Thanks @ikonst - the UnicodeDelimitedTupleAttribute is definitely a good way to manage composite keys; it's very similar to the approach I've taken on my projects.

On the back of the custom in-application work I've been doing, I've started working (very casually) on a library which uses PynamoDB's models and some Attribute classes, but encourages the concept of modelling your data as DynamoDB partitions rather than tables (i.e. one table, many models).

The potential danger of this approach is hot partitions on DynamoDB. But as long as your data either a) will fit inside one DynamoDB partition, or b) can use a significant spread of hash (partition) key values, then it should be easy to overcome.

Your blogging system example, using the concept of "models-as-partitions" (one table, many models), could yield data in DynamoDB which looks something like this:

hash_key range_key created_dt content username value
posts post1 1569038270 Some post
posts post2 1569038270 Another post
posts post3 1569038270 Third post
users user1 1569038270 Bob
users user2 1569038270 Roger
users user3 1569038270 Tim
counters post1 1569038270 23
counters post2 1569038270 44
counters post3 1569038270 64

Your python models could then define a single base class for each logical partition. i.e. A Posts model, Users model etc.

However, given the small number of hash_key values, you'd need to make sure all your data would fit within one DynamoDB partition (per a) above).

Note though that time-based analytics would not be a good use-case for a single-table pattern (per DynamoDB's best practices docs).

An (somewhat contrived) example of a data schema which I feel avoids hot partitions is as follows:

hash_key range_key created_dt username full_name content owner_id post_id primary_email
ff1c09eb-2e67-4773-a8ae-6d9e48055034#user meta 1569038771 bob1 Bob Richards ff1c09eb-2e67-4773-a8ae-6d9e48055034
ff1c09eb-2e67-4773-a8ae-6d9e48055034#user email_data 1569038771 ff1c09eb-2e67-4773-a8ae-6d9e48055034 [email protected]
ff1c09eb-2e67-4773-a8ae-6d9e48055034#user comments#1569038771 1569038771 This is bob's comment ff1c09eb-2e67-4773-a8ae-6d9e48055034 841f97a1-2004-4ee5-907d-cc24f928f8cb
17a4b8d2-79fa-4564-ab2c-bdc9dd4ce4b5#user meta 1569038771 tim2 Tim Nice 17a4b8d2-79fa-4564-ab2c-bdc9dd4ce4b5
35c208e9-6057-40c8-b062-1f722f07d5f6#user meta 1569038771 roger3 Roger Dodger 35c208e9-6057-40c8-b062-1f722f07d5f6
841f97a1-2004-4ee5-907d-cc24f928f8cb#post posts 1569039058 This is a post body 841f97a1-2004-4ee5-907d-cc24f928f8cb

You could add GSIs for owner_id and post_id to allow for retrieval of all data belonging to a user or post.

grvhi avatar Sep 21 '19 04:09 grvhi

I'm working on a project that is following the linked best practise for data storage. As it stands, there doesn't seem to be any documented support for this access and storage pattern. Is there any goal to eventually include this as part of the documentation?

@grvhi Do you have any further reading on your approach?

In the meantime, I'm going to make a small attempt at this and report back on my findings for your proposed approach using UnicodeDelimitedTupleAttribute.

curlywurlycraig avatar Sep 23 '19 10:09 curlywurlycraig

@curlywurlycraig - I don't have any documentation or currently-available reference implementation. However, I will say that the UnicodeDelimitedTupleAttribute is definitely a good starting point, based on what I've come across going down this road, so far.

The library I'm working on (which I refer to briefly above) has implemented a class I've called Partition which uses PynamoDB's MetaModel and AttribubteContainer to offer similar functionality as PynamoDB's Model, but with some workarounds for inheriting the Meta class and correctly serialising composite keys. However, the main focus of the library is to tightly couple graphql-core-next, a "query planner" and results caching. It's very (very) early days at this stage, but I could share it if you think it will be useful.

grvhi avatar Sep 23 '19 14:09 grvhi

That could be pretty helpful for me. I'm currently having a crack at creating a very minimal similar thing for our use cases, so having a reference would be helpful if you're interested in sharing the source.

curlywurlycraig avatar Sep 23 '19 14:09 curlywurlycraig

@curlywurlycraig - here's a gist: https://gist.github.com/grvhi/78889f32b3701c421ef30e72aebc7f69

I've tried to remove anything which related specifically to the integration of graphql-core-next; hopefully I've neither removed too much, nor too little! Happy to address comments/questions (if you have any) on the gist itself.

Note that there's still a lot of work to be done here and I'm very much open to the idea that I've gone down the wrong path! Would be interested to hear your thoughts.

grvhi avatar Sep 23 '19 14:09 grvhi

@grvhi Thank you! I'll take a look at this. I have my own piece of code at the moment and will compare. Your approach looks more comprehensive than mine.

curlywurlycraig avatar Sep 23 '19 14:09 curlywurlycraig

How have you guys progressed in using pynamo for single-table design? I'm working on a new project and after reading lots of aws documentation, blogs, etc, it's clear that they promote 1-app/1-table. It's not clear how to implement this in pynamodb or whether native support will eventually land?

erichaus avatar Dec 11 '19 22:12 erichaus

Seems like even AWS's own internal team didn't implement adjacency lists when doing this for AppSync.

ricky-sb avatar Dec 12 '19 19:12 ricky-sb

@ricky-sb I've had an aws engineer tell me single table makes m2m hard at scale, but publicly they are recommending single table design: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html#bp-general-nosql-design-concepts You should maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table.

erichaus avatar Dec 12 '19 21:12 erichaus

I've built a Python library called ~Dokklib-DB~ for the DynamoDB single table pattern.

~Dokklib-DB~ takes a different philosophy from PynamoDB in that follows the DynamoDB API closely, so if you've have used Boto3 before, it will feel familiar.

Features:

  • Simple, Pythonic query interface on top of Boto3. No more nested dict literals!
  • Type safety for primary keys and indices (for documentation and data integrity).
  • Easy error handling.
  • Full type hint & unit test coverage + integration testing.

~Docs: https://dokklib.com/libs/db/~

Update 2022-03-11: The project is now archived, but the source is still available at https://github.com/dokklib/dokklib-db

agostbiro avatar Feb 25 '20 19:02 agostbiro

@abiro this looks really nice! great work! hm now I want to give it a try... using multi-table atm, def bookmarking, thank you

erichaus avatar Feb 25 '20 20:02 erichaus

We're working on a fork(ish) of PynamoDB that's built for single table design. Same interface, but works with multiple models on a single table: https://github.com/3mcloud/falcano

erictwalker18 avatar Aug 13 '20 18:08 erictwalker18

If the ability to query multiple polymorphic models at the same time was added, then I think the single-table use case would be more-or-less covered. E.g:

class ParentModel(Model):
    class Meta:
        table_name = 'polymorphic_table'
    id = UnicodeAttribute(hash_key=True)
    sort = UnicodeAttribute(range_key=True)
    cls = DiscriminatorAttribute()

class FooModel(ParentModel, discriminator='Foo'):
    foo = UnicodeAttribute()

class BarModel(ParentModel, discriminator='Bar'):
    bar = UnicodeAttribute()

items = ParentModel.query('some id')
# Items contains instances of both FooModel, and BarModel depending on the discriminator property

All models on the same table would be sub-classes of a single parent model.

wisaac407 avatar Feb 04 '21 18:02 wisaac407

#1004 with this issue present, polymorphic models are out of the choices for me right now 😢

mrsakkaro avatar Feb 03 '22 08:02 mrsakkaro

It would be really nice to have support for Single table design for DynamoDB at hash level, without using additional columns (like typedorm) . I believe this is how most people are using it, at least most of the examples I see and that's what we are using. I'm guessing pynamodb would be even more useful in this cases to deal with querying using single table design and it would gain wider adoption. :)

bafonso avatar Feb 08 '22 16:02 bafonso

@bafonso what do you mean by "at hash level"?

erichaus avatar Feb 14 '22 18:02 erichaus

@bafonso what do you mean by "at hash level"?

Sorry my terminology is not very accurate. I basically meant support for prefixing keys with type, ie USER#, PRODUCT#<product_id> etc

bafonso avatar Mar 18 '22 14:03 bafonso

If the ability to query multiple polymorphic models at the same time was added, then I think the single-table use case would be more-or-less covered. E.g:

class ParentModel(Model):
    class Meta:
        table_name = 'polymorphic_table'
    id = UnicodeAttribute(hash_key=True)
    sort = UnicodeAttribute(range_key=True)
    cls = DiscriminatorAttribute()

class FooModel(ParentModel, discriminator='Foo'):
    foo = UnicodeAttribute()

class BarModel(ParentModel, discriminator='Bar'):
    bar = UnicodeAttribute()

items = ParentModel.query('some id')
# Items contains instances of both FooModel, and BarModel depending on the discriminator property

All models on the same table would be sub-classes of a single parent model.

Agree completely with this approach.

OGoodness avatar Jul 08 '22 13:07 OGoodness