boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

DynamoDB paginator fails when partition key is of type number

Open twitu opened this issue 3 years ago • 9 comments

Describe the bug Pagination for dynamodb fails when the partition key is of type number.

This happens because the returned LastEvaluatedKey is of type Decimal and it cannot be json encoded (json.dumps fails). This happens in botocore.paginate.encode it tries to do convert the token to string.

This issue talks about the issue with Decimal types. The gist is that a Number type is returned like this { value: 3} -> { value: { N: Decimal("3")}}. This cannot be converted to string by json.

Steps to reproduce

  1. Create a table, choose Number as type of partition key, choose a sort key (anything is fine)
  2. Add two records with same partition key and different sort key
  3. Run a query similar to the following.
ddb_client = boto3.resource("dynamodb").meta.client
query_paginator = ddb_client.get_paginator("query")

page_iterator = query_paginator.paginate(
    TableName=<TABLE-NAME>,
    KeyConditionExpression=Key(<PK>).eq(<PK-VALUE>),
    ScanIndexForward=False,
    PaginationConfig = {
        MaxItems: 1,
        PageSize: 1,
    }
)

for page in page_iterator:
  for return_value in page["Items"]:
      print(return_value)

Expected behavior Pagination should succeed. It should handle the Decimal type by itself.

twitu avatar Jul 16 '21 12:07 twitu

Hi @twitu,

Thanks for pointing this out. I can definitely see where this could cause some problems. Would you be able to provide debug logs by adding boto3.set_stream_logger('') to your code? Please redact any sensitive information, such as account numbers. Thanks!

stobrien89 avatar Jul 19 '21 21:07 stobrien89

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.

github-actions[bot] avatar Jul 26 '21 22:07 github-actions[bot]

I've added debug logs here. This is from running the function in a vanilla Python3.8 Lambda.

botocore.retryhandler [DEBUG] No retry needed.

botocore.hooks [DEBUG] Event after-call.dynamodb.Query: calling handler <bound method TransformationInjector.inject_attribute_value_output of <boto3.dynamodb.transform.TransformationInjector object at 0x7f8f4a706d30>>

[ERROR] TypeError: Object of type Decimal is not JSON serializable
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 158, in lambda_handler
    for page in page_iterator:
  File "/var/runtime/botocore/paginate.py", line 280, in __iter__
    self._truncate_response(parsed, primary_result_key,
  File "/var/runtime/botocore/paginate.py", line 424, in _truncate_response
    self.resume_token = next_token
  File "/var/runtime/botocore/paginate.py", line 230, in resume_token
    self._resume_token = self._token_encoder.encode(value)
  File "/var/runtime/botocore/paginate.py", line 65, in encode
    json_string = json.dumps(encoded_token)
  File "/var/lang/lib/python3.8/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/var/lang/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/var/lang/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/var/lang/lib/python3.8/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '

twitu avatar Jul 29 '21 23:07 twitu

I have a specific use case where I want n records satisfying a filter condition. I don't really need the ability to paginate. Here's a partial workaround** for anyone who might be facing this issue.

In the pseudocode below I just set the MaxItem to the desired n in PaginationConfig i.e. I do not set the PageSize parameter. This makes the first page automatically itself yield the required number of values.


page_iterator = ddb_client.get_paginator("query").paginate(PaginationConfig = { "MaxItem": n }, **other_arguments)

for page in page_iterator:
  for return_value in page["Items"]:
    print(return_value)

twitu avatar Jul 29 '21 23:07 twitu

Hi @twitu,

Thanks for the additional information. Leaving this marked as a bug for now, as it appears #369 may need to be addressed before anything can be done about this.

stobrien89 avatar Aug 04 '21 18:08 stobrien89

I don't see why that's the case. It's possible to handle encoding decimal.Decimal objects without making not using decimals a generally available feature, especially since the failing function, TokenEncoder.encode, is documented as producing an "opaque string". Presumably one could extend TokenEncoder._encode, or provide a subclass of JSONEncoder that handles decimals.

bwo avatar Aug 20 '21 18:08 bwo

 $ git diff                                                                                                            
diff --git a/botocore/paginate.py b/botocore/paginate.py                                                                                                                   
index b08c7ed8b..501f8bf74 100644
--- a/botocore/paginate.py
+++ b/botocore/paginate.py
@@ -15,6 +15,7 @@ from itertools import tee
 
 from botocore.compat import six
 
+from decimal import Decimal
 import jmespath
 import json
 import base64
@@ -27,6 +28,13 @@ from botocore.utils import set_value_from_jmespath, merge_dicts
 log = logging.getLogger(__name__)
 
 
+class DecimalEncoder(json.JSONEncoder):
+    def default(self, o):
+        if isinstance(o, Decimal):
+            return str(o)
+        return json.JSONEncoder.default(o)
+
+
 class TokenEncoder(object):
     """Encodes dictionaries into opaque strings.
 
@@ -52,7 +60,7 @@ class TokenEncoder(object):
         try:
             # Try just using json dumps first to avoid having to traverse
             # and encode the dict. In 99.9999% of cases this will work.
-            json_string = json.dumps(token)
+            json_string = json.dumps(token, cls=DecimalEncoder)
         except (TypeError, UnicodeDecodeError):
             # If normal dumping failed, go through and base64 encode all bytes.
             encoded_token, encoded_keys = self._encode(token, [])

seems to work

(ETA: well, you'd also want to update the second call to json.dumps.)

bwo avatar Aug 20 '21 19:08 bwo

Up. Also having this issue. Unfortunately, the fix proposed by @twitu does not work with a large number of items, since boto seems to self paginate. However, it does works for a small number of items.

Any update @stobrien89 ? Seems like the https://github.com/boto/boto3/issues/369 issue could last another 7 years :)

Anyway, thanks for your time and sweat.

antoinejeannot avatar Apr 05 '22 12:04 antoinejeannot