electrodb icon indicating copy to clipboard operation
electrodb copied to clipboard

DynamoDb Multi-Attribute Keys

Open aphex opened this issue 1 month ago • 1 comments

Announced yesterday DynamoDB can now manage multi-attribute composite PK's and SK's for GSIs. This is a pretty valuable upgrade for Dynamo users. Though performance still needs to be tested the assumption is having access to native data types will be a win. However a more interesting direct benefit is the ability to add new access patterns without having to backfill data. A GSI can be created across any existing attributes and Dynamo will handle querying and sparse indexing.

Reference

https://aws.amazon.com/blogs/database/multi-key-support-for-global-secondary-index-in-amazon-dynamodb/ https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.DesignPattern.MultiAttributeKeys.html

Summary

This Issue is to discuss how this could be implemented in Electro, keeping the existing concatenation pattern but moving towards the native managed solution.

Below are some initial thoughts and examples. The TLDR of this is: We introduce a new index type of multi-attribute this then forces the pk and sk properties to be arrays. These arrays are ordered just like the KeySchema is in the Dynamo configuration. I also propose introducing a "magic string" to indicate the entity type. Luckily all ElectroDB Items have been writing this automatically so users will have access to this when creating new access patterns.

Sample Code

import { Entity } from 'electrodb'

const User = new Entity(
  {
    model: {
      entity: 'user',
      service: 'user-directory',
      version: '1',
    },
    attributes: {
      userId: {
        type: 'string',
      },
      firstName: {
        type: 'string',
        required: true,
      },
      lastName: {
        type: 'string',
        required: true,
      },
      birthYear: {
        type: 'number',
      },
      petsOwned: {
        type: 'number',
      },
    },
    indexes: {
      byUserId: {
        pk: {
          field: 'pk',
          composite: ['userId'],
        },
        sk: {
          field: 'sk',
          composite: [],
        },
      },
      // Indicate this index is a multi-attribute index so there is no "real" PK or SK attributes
      // Switching pk to an array should allow TS to infer this is a different kind of index and also
      // enforce that sk should also only be an array.
      // `collection` would not be an available option here when using a multi-attribute index
      // There is also no casing, template, or casting of the index attribute as it is read directly from
      // Dynamo.
      byNameAndBirthYear: {
        index: 'name_born-gsi',
        type: 'multi-attribute',
        pk: ['lastName'],
        sk: ['birthYear'],
      },
      // In this scenario the user has added the `__edb_e__` attribute to the KeySchema as PK1
      // this now means we do not need to filter on the entity name and the entity is now part
      // of the index PK
      byEntity_NameAndBirthYear: {
        index: 'user_name_born-gsi',
        type: 'multi-attribute',
        pk: ['$entity', 'lastName'], // some special attribute to indicate the entity name
        sk: ['birthYear'],
      },
      // This is another potential multi-attribute index use case where a user needs to query across
      // the whole entity partition. However this may be such bad practice we do not provide a
      // way to do this OOTB.
      byPetsOwned: {
        index: 'user-pets-gsi',
        type: 'multi-attribute',
        pk: ['$entity'], // some special attribute to indicate the entity name
        sk: ['petsOwned'],
      },
    },
  },
  { table: 'your_table_name' }
)

// Query by users last name and born before 1980
// This index cannot guarantee the results are `User` entities so we will need to use a filter
// It is likely best to inform users that creating unique attributes for multi-attribute GSI's would
// be a best practice when not using `$entity`. 
// This pattern allows users to add new multi-attribute indexes to their existing data without
user.query.byNameAndBirthYear({ lastName: 'Doe' }).gt({ birthYear: 1980 }).go()
**// Results in**
// {
//   TableName: 'UserTable',
//   IndexName: 'name_born-gsi',
//   KeyConditionExpression: '#lastName = :lastName AND #birthYear > :birthYear',
//   ExpressionAttributeNames: {
//     '#lastName': 'lastName',
//     '#birthYear': 'birthYear',
//     '#entity': '__edb_e__',
//   },
//   ExpressionAttributeValues: {
//     ':lastName': 'Doe',
//     ':birthYear': 1980,
//     ':entity': 'User',
//   },
//   "FilterExpression": "#entity = :entity"
// }

// Similar to the previous example but here we are adding the entity name to the index instead of reaching
// across all entities.
user.query.byEntity_NameAndBirthYear({ lastName: 'Doe' }).gt({ birthYear: 1980 }).go()
**// Results In**
// {
//   TableName: 'UserTable',
//   IndexName: 'user_name_born-gsi',
//   KeyConditionExpression: '#entity = :entity AND #lastName = :lastName AND #birthYear > :birthYear',
//   ExpressionAttributeNames: {
//     '#lastName': 'lastName',
//     '#birthYear': 'birthYear',
//     '#entity': '__edb_e__',
//   },
//   ExpressionAttributeValues: {
//     ':lastName': 'Doe',
//     ':birthYear': 1980,
//     ':entity': 'User',
//   }
// }

// Query all users that have more than 5 pets
// Note here it is possible there is no need for any PK attributes as it is querying across the whole entity
// partition. Electro could take care of this for you. This is obviously a dangerous pattern as it can easily
// lead to a hot partition.
user.query.byPetsOwned().gt({ petsOwned: 5 }).go()
**// Results In**
// {
//   TableName: 'UserTable',
//   IndexName: 'user-pets-gsi',
//   KeyConditionExpression: '#entity = :entity AND #petsOwned > :petsOwned',
//   ExpressionAttributeNames: {
//     '#entity': '__edb_e__',
//     '#petsOwned': 'petsOwned',
//   },
//   ExpressionAttributeValues: {
//     ':entity': 'User',
//     ':petsOwned': 5,
//   }
// }

These examples assume the following DynamoDB configuration

{
GlobalSecondaryIndexes: [
        {
            IndexName: 'name_born-gsi',
            KeySchema: [
                { AttributeName: 'lastName', KeyType: 'HASH' },    // GSI PK 1
                { AttributeName: 'birthYear', KeyType: 'RANGE' },  // GSI SK 1
            ],
            Projection: { ProjectionType: 'ALL' }
        },
        {
            IndexName: 'user_name_born-gsi',
            KeySchema: [
                { AttributeName: '__edb_e__', KeyType: 'HASH' },       // GSI PK 1
                { AttributeName: 'lastName', KeyType: 'HASH' },       // GSI PK 2
                { AttributeName: 'birthYear', KeyType: 'RANGE' },      // GSI SK 1
            ],
            Projection: { ProjectionType: 'ALL' }
        },
        {
            IndexName: 'user-pets-gsi',
            KeySchema: [
                { AttributeName: '__edb_e__', KeyType: 'HASH' },       // GSI PK 1
                { AttributeName: 'petsOwned', KeyType: 'RANGE' },      // GSI SK 1
            ],
            Projection: { ProjectionType: 'ALL' }
        }
    ]
}

Additional Options

It is also worth noting it may be valuable to introduce a new common attribute of __edb_c__ which would be the collection name. This would allow collections to participate in multi-attribute indexes. This is not a requirement but something to consider moving forward. It could allow for a similar $collection attribute to be used in the in a multi-attribute index PK to get multiple entities from a collection. Since currently collection names are backed into SK compositing they will not be available for this new feature. However when creating new entities, or new entity versions, we could start providing this option. By using collection Electro could know about this connection across entities.

However it would also be possible for a user to simply used shared attributes to create a collection. For example they could add an attribute of collection to each entity and then add collection as a PK to a multi-attribute indexes KeySchema. This would then return multiple entities from the query. This opens some very interesting options for cross entity querying and may put some work back on Electro to property separate these entities for queries on a multi-attribute index that is not using the $entity template string.

aphex avatar Nov 21 '25 22:11 aphex

The following is a copy/paste of the initial conversation within the SST electrodb discord channel for posterity.

@aphex:

I was gonna pop in here and post the same GSI link. The blog post has some nice examples and breakdown of how this differs from the concatenation base composites we have now. https://aws.amazon.com/blogs/database/multi-key-support-for-global-secondary-index-in-amazon-dynamodb/

Wondering how much of a shift this would be for electro.

Would you want to go the route of opting an entity into native key composition and maintain both concatenation and native composite keys in electro or release a version that is breaking with only native composite key support.

Though maybe this is better opt'd into on the index definition then the whole entity. Would allow existing electro entities to add native patterns.

Obviously just came out so it needs some time to see what the performance benefits are. Key concatenation does solve these same problems and still requires full left to right partial key querying. Native limits to 4 attributes, though talks about maintaining dynamo performance at scale.

I think the main benefit here would be the native data types. Being able to have a Boolean, Number, or Timestamp mixed with strings or other data types in your key. Not something I have struggled with much, but may have performance benefits or be blockers for others.

"You can use attributes that already exist in your table without backfilling synthetic keys across your data" is very interesting though.

@tywalch:

Wondering how much of a shift this would be for electro. With the caveat that I haven't had to a chance to play with it yet, I am pretty confident that supporting this wouldn't introduce any breaking changes. My gut says this could just be configured on the individual index. Sort key comparison on multi-attribute keys would be different by default, but that could be documented.

That's from the user/api perspective, adopting it is a different story. I don't have a sense of scope for adoption because key management logic is spread around and poorly factored. I've known for a while that I've need to rethink key building, but there hasn't really been a forcing function quite like this yet. That said, this is also why I invested in the sheer volume of tests I have around key management.

@aphex:

I agree I think support could be manageable. Just getting that so folks can add new access patterns without backfill would be a big step.

However the more I've been thinking about it I think there would need to be some branching here for concatenated keys vs multi-attribute.

clustered indexes could now be handled natively with an attribute as opposite to concatenation as well as collections. Another big benefit here is partial key patching/updating.

That giant table we work through a while back on various sparse index possibilities. I believe that all disappears when each attribute can now be changed without the need for the others. This will likely change logic for both the update and patch methods but only for multi-attribute indexes.

Figuring out a potential migration process for existing users, if even through a guide, would also have value. This would require backfill oddly enough, but in a different way. We would want to pull data that is currently only available in a concatendated key string and separate them into attributes. While also being aware of the 4/4 attribute limit.

Moving away from concatenated keys is starting to show more advantages the more I dig into it. Native data types, new access patterns without backfill, simpler sparse index updates, no update/patch key dependencies.

@tywalch:

clustered indexes could now be handled natively with an attribute as opposite to concatenation as well as collections. Another big benefit here is partial key patching/updating. Yeah, these are great call-outs 👍

A lot of design decisions to be made. For example, to your point about the 4/4 limit, one implementation could see us optimizing to own only one [sort key] attribute. Very pie-in-the-sky, but where that attribute is slotted would be up to the user. Instead of just all the way left (isolated) or all the way right (clustered), they would have complete control.

This will likely change logic for both the update and patch methods, but only for multi-attribute indexes. Yeah, it will enable removing a lot of validation/constraint code for multi-attribute indexes 💪

tywalch avatar Nov 22 '25 18:11 tywalch