PynamoDB
PynamoDB copied to clipboard
Initial DescribeTable before GetItem
While profiling my application, I realized that there is an initial DescribeTable call sent to DynamoDB before a first GetItem call. Is there a way to remove that DescribeTable call? It adds a lot of latency to the application running on AWS Lambda.
Below is a complete example that can be run locally with DynamoDB Local. The first execution will create the table and add a first item. The second execution will call GetItem four times. Before the first GetItem call, a DescribeTable is added. Package wrapt
is required.
$ python test_describe_table.py
0.05 ms - Calling DescribeTable
429.72 ms - Calling DescribeTable
440.19 ms - Calling CreateTable
571.44 ms - Calling DescribeTable
583.39 ms - Calling DescribeTable
600.91 ms - Calling PutItem
$ python test_describe_table.py
0.04 ms - Calling DescribeTable
57.30 ms - Calling GetItem
100.75 ms - Calling GetItem
114.38 ms - Calling GetItem
129.07 ms - Calling GetItem
142.16 ms - Calling PutItem
from random import randint
from time import time
from wrapt import wrap_function_wrapper
from pynamodb.models import Model
from pynamodb.attributes import NumberAttribute, UnicodeAttribute
from pynamodb.exceptions import TableDoesNotExist
class TestTable(Model):
class Meta:
table_name = 'TestTable'
host = 'http://localhost:8000'
key = UnicodeAttribute(hash_key=True, attr_name='k')
value = NumberAttribute(default=0, attr_name='v')
def patch_dynamodb():
wrap_function_wrapper(
'pynamodb.connection.base',
'Connection.dispatch',
xray_traced_pynamodb,
)
def xray_traced_pynamodb(wrapped, instance, args, kwargs):
print('{:>6.2f} ms - Calling {}'.format(1000 * (time() - start_time), args[0]))
return wrapped(*args, **kwargs)
if __name__ == '__main__':
start_time = time()
patch_dynamodb()
key = 'test-key'
try:
item = TestTable.get(key)
item = TestTable.get(key)
item = TestTable.get(key)
item = TestTable.get(key)
except TableDoesNotExist:
TestTable.create_table(read_capacity_units=1, write_capacity_units=1, wait=True)
item = TestTable(key, value=randint(0, 100))
item.save()
Yeah that describe_table is killing my app performance too (running in Lambda). This comes from Model._get_meta_data
. Is there any way we can specify this meta info without doing DescribeTable? I've had to overload repr just to stop those calls when running my unit tests XD .
I' working on a PR now to bypass DescribeTable. Any suggestions on how I should proceed?
Looks like the model classes can define _meta_table with the result of a DescribeTable call and thus avoid this call at runtime:
https://github.com/pynamodb/PynamoDB/blob/master/pynamodb/models.py#L1251
Yeah I ended up overriding the _get_meta_data
classmethod
@classmethod
def _get_meta_data(cls):
if cls._meta_table is None:
cls._meta_table = MetaTable(describe_table.generate())
return cls._meta_table
where describe_table.generate()
returns the describe_table data. Then I iterate over all my models and set it in the Connection as well:
for name, Model in models.__dict__.items():
if isinstance(Model, MetaModel):
Model.Meta.table_name = 'project-{}-{}'.format(
STAGE, Model.Meta.simple_name)
connection_tables = Model._get_connection().connection._tables
connection_tables[Model.Meta.table_name] = Model._get_meta_data()
Thanks @phil-hachey for your comment. Here is the code I am using to generate the DescribeTable data and patch the Model table.
from pynamodb.connection.base import MetaTable
from pynamodb.attributes import Attribute
from pynamodb.constants import ATTR_TYPE_MAP
def patch_meta_table(table):
meta = generate_meta_table(table)
table._get_connection().connection.client
table._get_connection().connection._tables[table.Meta.table_name] = meta
table._meta_table = meta
def generate_meta_table(table):
data = {
'AttributeDefinitions': [],
'KeySchema': [],
'TableName': table.Meta.table_name,
'TableStatus': 'ACTIVE',
'CreationDateTime': 0,
'ProvisionedThroughput': {
'LastIncreaseDateTime': 0.0,
'LastDecreaseDateTime': 0.0,
'NumberOfDecreasesToday': 0,
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1
},
'TableSizeBytes': 0,
'ItemCount': 0,
'TableArn': 'arn:aws:dynamodb:ddblocal:000000000000:table/' + table.Meta.table_name,
}
attrs = [i for i in dir(table) if issubclass(type(getattr(table, i)), Attribute)]
for attr in attrs:
data['AttributeDefinitions'].append({
'AttributeName': getattr(table, attr).attr_name,
'AttributeType': ATTR_TYPE_MAP[getattr(table, attr).attr_type],
})
if getattr(table, attr).is_hash_key:
data['KeySchema'].append({'AttributeName': getattr(
table, attr).attr_name, 'KeyType': 'HASH'})
elif getattr(table, attr).is_range_key:
data['KeySchema'].append({'AttributeName': getattr(
table, attr).attr_name, 'KeyType': 'RANGE'})
return MetaTable(data)
Now I am patching the TestTable
with patch_meta_table(TestTable)
. Please let me know what you think of this. We can work on a PR to avoid modifying private variables inside PynamoDB. Maybe a variable in the Meta
model class that would auto-generate the MetaTable.
How did you guys end up solving this? I've got Indexes and Enums as well, which throws a bit of a wrench into things.
Hi! I'm going to fix it. I have some ideas and am going to push a PR next week, so feel free to assign me to the issue.
I've noticed, that Model._get_schema
returns the schema with pythonic keys, but by using it's always converted to camel case (because DynamoDB API expects camel case). May I change a return value of Model._get_schema
to camel case?
@levesquejf code broke querying by index for me, Is there a way to fix that?
I have hit this issue when using the new discriminators. Now every model gets its own connection.
I have solved the issue by having a parent model like this:
class BaseModel(Model):
_connection_map = {}
@classmethod
def _get_connection(cls) -> TableConnection:
"""
This makes sure that we have only one connection per table.
"""
if cls.Meta.table_name not in BaseModel._connection_map:
BaseModel._connection_map[cls.Meta.table_name] = super()._get_connection()
return BaseModel._connection_map[cls.Meta.table_name]
I think this is the biggest limit of PynamoDB nowdays. It is almost unusable inside a lambda function (which is the "natural mate" of DynamoDB). I'm getting about 3s of startup time, when using 2 models.
Can anyone provide a workaround properly working with the current PynamoDB release (5.2.1)?
@lrodorigo -- we just ran into this; our solution was to use the suggestion right above your comment ☝️ and also to use lambdas with provisioned concurrency; for the latter, we "pre-warm" pynamodb by doing something like this at the start of the lambda function:
BaseModel._get_connection().get_meta_table(BaseModel.Meta.table_name)
Provisioned concurrency ensures that the above won't actually happen prior to the lambda being available for invocation. Still, this entire scenario, plus the fact that this issue is almost 5 yrs old, has us looking around for other options.
This is indeed the biggest limit of PynamoDB.
@ikonst or someone maintaining this lib. Can you suggest a workaround or comment on whether a fix can be implemented? instead of 1 digit millisecond hits to dynamodb you get 100+ millisecond for every model for describing the table, which is completely unnescesarry as you know the schema and indexes beforehand and should be able to provide this to pynamo.
I guess since this issue is 5 years old we should also look around for other options.
I agree we should get rid of the DescribeTable
call. It also makes mocking (for tests) more awkward since 1st time behaves differently from 2nd+ invocation.
@ikonst Is that something you can do?
If not can you describe what is needed in order to get rid of it so someone can contribute with a PR?
We're having a code freeze before 🎃 Halloween at my job, so I'm taking a stab at it :)
Awesome!
Seems I pinged the correct person, thanks a lot!