boto3
boto3 copied to clipboard
DynamoDB paginator fails when partition key is of type number
Describe the bug Pagination for dynamodb fails when the partition key is of type number.
This happens because the returned LastEvaluatedKey is of type Decimal
and it cannot be json encoded (json.dumps
fails). This happens in botocore.paginate.encode
it tries to do convert the token to string.
This issue talks about the issue with Decimal
types. The gist is that a Number type is returned like this { value: 3}
-> { value: { N: Decimal("3")}}
. This cannot be converted to string by json.
Steps to reproduce
- Create a table, choose Number as type of partition key, choose a sort key (anything is fine)
- Add two records with same partition key and different sort key
- Run a query similar to the following.
ddb_client = boto3.resource("dynamodb").meta.client
query_paginator = ddb_client.get_paginator("query")
page_iterator = query_paginator.paginate(
TableName=<TABLE-NAME>,
KeyConditionExpression=Key(<PK>).eq(<PK-VALUE>),
ScanIndexForward=False,
PaginationConfig = {
MaxItems: 1,
PageSize: 1,
}
)
for page in page_iterator:
for return_value in page["Items"]:
print(return_value)
Expected behavior
Pagination should succeed. It should handle the Decimal
type by itself.
Hi @twitu,
Thanks for pointing this out. I can definitely see where this could cause some problems. Would you be able to provide debug logs by adding boto3.set_stream_logger('')
to your code? Please redact any sensitive information, such as account numbers. Thanks!
Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.
I've added debug logs here. This is from running the function in a vanilla Python3.8 Lambda.
botocore.retryhandler [DEBUG] No retry needed.
botocore.hooks [DEBUG] Event after-call.dynamodb.Query: calling handler <bound method TransformationInjector.inject_attribute_value_output of <boto3.dynamodb.transform.TransformationInjector object at 0x7f8f4a706d30>>
[ERROR] TypeError: Object of type Decimal is not JSON serializable
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 158, in lambda_handler
for page in page_iterator:
File "/var/runtime/botocore/paginate.py", line 280, in __iter__
self._truncate_response(parsed, primary_result_key,
File "/var/runtime/botocore/paginate.py", line 424, in _truncate_response
self.resume_token = next_token
File "/var/runtime/botocore/paginate.py", line 230, in resume_token
self._resume_token = self._token_encoder.encode(value)
File "/var/runtime/botocore/paginate.py", line 65, in encode
json_string = json.dumps(encoded_token)
File "/var/lang/lib/python3.8/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/var/lang/lib/python3.8/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/var/lang/lib/python3.8/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/var/lang/lib/python3.8/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
I have a specific use case where I want n
records satisfying a filter condition. I don't really need the ability to paginate. Here's a partial workaround** for anyone who might be facing this issue.
In the pseudocode below I just set the MaxItem
to the desired n
in PaginationConfig
i.e. I do not set the PageSize
parameter. This makes the first page automatically itself yield the required number of values.
page_iterator = ddb_client.get_paginator("query").paginate(PaginationConfig = { "MaxItem": n }, **other_arguments)
for page in page_iterator:
for return_value in page["Items"]:
print(return_value)
Hi @twitu,
Thanks for the additional information. Leaving this marked as a bug for now, as it appears #369 may need to be addressed before anything can be done about this.
I don't see why that's the case. It's possible to handle encoding decimal.Decimal
objects without making not using decimals a generally available feature, especially since the failing function, TokenEncoder.encode
, is documented as producing an "opaque string". Presumably one could extend TokenEncoder._encode
, or provide a subclass of JSONEncoder
that handles decimals.
$ git diff
diff --git a/botocore/paginate.py b/botocore/paginate.py
index b08c7ed8b..501f8bf74 100644
--- a/botocore/paginate.py
+++ b/botocore/paginate.py
@@ -15,6 +15,7 @@ from itertools import tee
from botocore.compat import six
+from decimal import Decimal
import jmespath
import json
import base64
@@ -27,6 +28,13 @@ from botocore.utils import set_value_from_jmespath, merge_dicts
log = logging.getLogger(__name__)
+class DecimalEncoder(json.JSONEncoder):
+ def default(self, o):
+ if isinstance(o, Decimal):
+ return str(o)
+ return json.JSONEncoder.default(o)
+
+
class TokenEncoder(object):
"""Encodes dictionaries into opaque strings.
@@ -52,7 +60,7 @@ class TokenEncoder(object):
try:
# Try just using json dumps first to avoid having to traverse
# and encode the dict. In 99.9999% of cases this will work.
- json_string = json.dumps(token)
+ json_string = json.dumps(token, cls=DecimalEncoder)
except (TypeError, UnicodeDecodeError):
# If normal dumping failed, go through and base64 encode all bytes.
encoded_token, encoded_keys = self._encode(token, [])
seems to work
(ETA: well, you'd also want to update the second call to json.dumps
.)
Up. Also having this issue. Unfortunately, the fix proposed by @twitu does not work with a large number of items, since boto seems to self paginate. However, it does works for a small number of items.
Any update @stobrien89 ? Seems like the https://github.com/boto/boto3/issues/369 issue could last another 7 years :)
Anyway, thanks for your time and sweat.