PynamoDB icon indicating copy to clipboard operation
PynamoDB copied to clipboard

Initial DescribeTable before GetItem

Open levesquejf opened this issue 6 years ago • 11 comments

While profiling my application, I realized that there is an initial DescribeTable call sent to DynamoDB before a first GetItem call. Is there a way to remove that DescribeTable call? It adds a lot of latency to the application running on AWS Lambda.

Below is a complete example that can be run locally with DynamoDB Local. The first execution will create the table and add a first item. The second execution will call GetItem four times. Before the first GetItem call, a DescribeTable is added. Package wrapt is required.

$ python test_describe_table.py
  0.05 ms - Calling DescribeTable
429.72 ms - Calling DescribeTable
440.19 ms - Calling CreateTable
571.44 ms - Calling DescribeTable
583.39 ms - Calling DescribeTable
600.91 ms - Calling PutItem

$ python test_describe_table.py
  0.04 ms - Calling DescribeTable
 57.30 ms - Calling GetItem
100.75 ms - Calling GetItem
114.38 ms - Calling GetItem
129.07 ms - Calling GetItem
142.16 ms - Calling PutItem
from random import randint
from time import time
from wrapt import wrap_function_wrapper

from pynamodb.models import Model
from pynamodb.attributes import NumberAttribute, UnicodeAttribute
from pynamodb.exceptions import TableDoesNotExist


class TestTable(Model):
    class Meta:
        table_name = 'TestTable'
        host = 'http://localhost:8000'

    key = UnicodeAttribute(hash_key=True, attr_name='k')
    value = NumberAttribute(default=0, attr_name='v')


def patch_dynamodb():
    wrap_function_wrapper(
        'pynamodb.connection.base',
        'Connection.dispatch',
        xray_traced_pynamodb,
    )


def xray_traced_pynamodb(wrapped, instance, args, kwargs):
    print('{:>6.2f} ms - Calling {}'.format(1000 * (time() - start_time), args[0]))
    return wrapped(*args, **kwargs)


if __name__ == '__main__':
    start_time = time()
    patch_dynamodb()
    key = 'test-key'
    try:
        item = TestTable.get(key)
        item = TestTable.get(key)
        item = TestTable.get(key)
        item = TestTable.get(key)
    except TableDoesNotExist:
        TestTable.create_table(read_capacity_units=1, write_capacity_units=1, wait=True)
        item = TestTable(key, value=randint(0, 100))
        item.save()

levesquejf avatar Nov 30 '17 05:11 levesquejf

Yeah that describe_table is killing my app performance too (running in Lambda). This comes from Model._get_meta_data. Is there any way we can specify this meta info without doing DescribeTable? I've had to overload repr just to stop those calls when running my unit tests XD .

I' working on a PR now to bypass DescribeTable. Any suggestions on how I should proceed?

phil-hachey avatar Dec 04 '17 20:12 phil-hachey

Looks like the model classes can define _meta_table with the result of a DescribeTable call and thus avoid this call at runtime:

https://github.com/pynamodb/PynamoDB/blob/master/pynamodb/models.py#L1251

prestomation avatar Dec 08 '17 07:12 prestomation

Yeah I ended up overriding the _get_meta_data classmethod

@classmethod
    def _get_meta_data(cls):
        if cls._meta_table is None:
            cls._meta_table = MetaTable(describe_table.generate())
        return cls._meta_table

where describe_table.generate() returns the describe_table data. Then I iterate over all my models and set it in the Connection as well:

for name, Model in models.__dict__.items():
        if isinstance(Model, MetaModel):
            Model.Meta.table_name = 'project-{}-{}'.format(
                STAGE, Model.Meta.simple_name)

            connection_tables = Model._get_connection().connection._tables
            connection_tables[Model.Meta.table_name] = Model._get_meta_data()

phil-hachey avatar Dec 08 '17 12:12 phil-hachey

Thanks @phil-hachey for your comment. Here is the code I am using to generate the DescribeTable data and patch the Model table.

from pynamodb.connection.base import MetaTable
from pynamodb.attributes import Attribute
from pynamodb.constants import ATTR_TYPE_MAP

def patch_meta_table(table):
    meta = generate_meta_table(table)
    table._get_connection().connection.client
    table._get_connection().connection._tables[table.Meta.table_name] = meta
    table._meta_table = meta

def generate_meta_table(table):
    data = {
        'AttributeDefinitions': [],
        'KeySchema': [],
        'TableName': table.Meta.table_name,
        'TableStatus': 'ACTIVE',
        'CreationDateTime': 0,
        'ProvisionedThroughput': {
            'LastIncreaseDateTime': 0.0,
            'LastDecreaseDateTime': 0.0,
            'NumberOfDecreasesToday': 0,
            'ReadCapacityUnits': 1,
            'WriteCapacityUnits': 1
        },
        'TableSizeBytes': 0,
        'ItemCount': 0,
        'TableArn': 'arn:aws:dynamodb:ddblocal:000000000000:table/' + table.Meta.table_name,
    }

    attrs = [i for i in dir(table) if issubclass(type(getattr(table, i)), Attribute)]
    for attr in attrs:
        data['AttributeDefinitions'].append({
            'AttributeName': getattr(table, attr).attr_name,
            'AttributeType': ATTR_TYPE_MAP[getattr(table, attr).attr_type],
        })

        if getattr(table, attr).is_hash_key:
            data['KeySchema'].append({'AttributeName': getattr(
                table, attr).attr_name, 'KeyType': 'HASH'})
        elif getattr(table, attr).is_range_key:
            data['KeySchema'].append({'AttributeName': getattr(
                table, attr).attr_name, 'KeyType': 'RANGE'})

    return MetaTable(data)

Now I am patching the TestTable with patch_meta_table(TestTable). Please let me know what you think of this. We can work on a PR to avoid modifying private variables inside PynamoDB. Maybe a variable in the Meta model class that would auto-generate the MetaTable.

levesquejf avatar Dec 29 '17 14:12 levesquejf

How did you guys end up solving this? I've got Indexes and Enums as well, which throws a bit of a wrench into things.

ricky-sb avatar Oct 14 '19 23:10 ricky-sb

Hi! I'm going to fix it. I have some ideas and am going to push a PR next week, so feel free to assign me to the issue.

shmygol avatar Jan 23 '20 21:01 shmygol

I've noticed, that Model._get_schema returns the schema with pythonic keys, but by using it's always converted to camel case (because DynamoDB API expects camel case). May I change a return value of Model._get_schema to camel case?

shmygol avatar Jan 27 '20 13:01 shmygol

@levesquejf code broke querying by index for me, Is there a way to fix that?

jonathan-kosgei avatar Nov 05 '21 20:11 jonathan-kosgei

I have hit this issue when using the new discriminators. Now every model gets its own connection.

I have solved the issue by having a parent model like this:

class BaseModel(Model):
    _connection_map = {}

    @classmethod
    def _get_connection(cls) -> TableConnection:
        """
        This makes sure that we have only one connection per table.
        """
        if cls.Meta.table_name not in BaseModel._connection_map:
            BaseModel._connection_map[cls.Meta.table_name] = super()._get_connection()
        return BaseModel._connection_map[cls.Meta.table_name]

fdobrovolny avatar Jun 08 '22 13:06 fdobrovolny

I think this is the biggest limit of PynamoDB nowdays. It is almost unusable inside a lambda function (which is the "natural mate" of DynamoDB). I'm getting about 3s of startup time, when using 2 models.

Can anyone provide a workaround properly working with the current PynamoDB release (5.2.1)?

lrodorigo avatar Aug 23 '22 08:08 lrodorigo

@lrodorigo -- we just ran into this; our solution was to use the suggestion right above your comment ☝️ and also to use lambdas with provisioned concurrency; for the latter, we "pre-warm" pynamodb by doing something like this at the start of the lambda function:

    BaseModel._get_connection().get_meta_table(BaseModel.Meta.table_name)

Provisioned concurrency ensures that the above won't actually happen prior to the lambda being available for invocation. Still, this entire scenario, plus the fact that this issue is almost 5 yrs old, has us looking around for other options.

msluyter avatar Sep 16 '22 14:09 msluyter

This is indeed the biggest limit of PynamoDB.

@ikonst or someone maintaining this lib. Can you suggest a workaround or comment on whether a fix can be implemented? instead of 1 digit millisecond hits to dynamodb you get 100+ millisecond for every model for describing the table, which is completely unnescesarry as you know the schema and indexes beforehand and should be able to provide this to pynamo.

I guess since this issue is 5 years old we should also look around for other options.

damir-fell avatar Oct 26 '22 08:10 damir-fell

I agree we should get rid of the DescribeTable call. It also makes mocking (for tests) more awkward since 1st time behaves differently from 2nd+ invocation.

ikonst avatar Oct 26 '22 18:10 ikonst

@ikonst Is that something you can do?

If not can you describe what is needed in order to get rid of it so someone can contribute with a PR?

damir-fell avatar Oct 27 '22 06:10 damir-fell

We're having a code freeze before 🎃 Halloween at my job, so I'm taking a stab at it :)

ikonst avatar Oct 27 '22 22:10 ikonst

Awesome!

damir-fell avatar Oct 28 '22 08:10 damir-fell

Seems I pinged the correct person, thanks a lot!

damir-fell avatar Oct 28 '22 19:10 damir-fell