boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

Adding support for float in DynamoDB type serilizer

Open andrewjroth opened this issue 4 years ago • 5 comments
trafficstars

The existing implementation of type serialization seems to deliberately not support Python float data type, resulting in errors. Often, data returned from other API's, including AWS API's, will be dropped into a DynamoDB table as an item. Python's float is a standard data type and is commonly returned from other sources.

As discussed in #665, while some other workarounds exist, the optimal solution is to properly handle float data types. Using float within Python is known to be non-precise, so inexact rounding in this process should not be an issue. Just in case, I've also updated the documentation to make it clear that float data types may not be precisely represented.

Please let me know if I should add additional details or modifications to meet the standards for boto3.

andrewjroth avatar Dec 17 '20 21:12 andrewjroth

Hi @andrewjroth, thanks for the PR! Unfortunately, I don't think this covers the failure cases in #665. We potentially have a way to fix this issue once boto3 is moved entirely onto Python 3, but the lack of precision control for floats means users end up with values that can't round trip from DynamoDB.

The reason we don't support float today is values requiring significant decimal precision won't work in Python. When we go to write a value to DynamoDB, it doesn't actually match the users input. You can see the issue is still present with a modification to your test case below:

def test_serialize_float(self):
        val = 0.999999999999
        self.assertEqual(
            self.serializer.serialize(val), {'N': str(val)})

You should end up with something like:

======================================================================
FAIL: test_serialize_float (tests.unit.dynamodb.test_types.TestSerializer)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/nateprewitt/Work/OpenSource/boto3/tests/unit/dynamodb/test_types.py", line 80, in test_serialize_float
    self.serializer.serialize(val), {'N': str(val)})
AssertionError: {'N': '0.99999999999900002212172012150404043496'} != {'N': '0.999999999999'}
- {'N': '0.99999999999900002212172012150404043496'}
+ {'N': '0.999999999999'}

nateprewitt avatar Dec 17 '20 23:12 nateprewitt

Hi @nateprewitt, thanks for the quick response!

Usage of float in Python is known to be inexact (as documented by Python, link below). Wouldn't it be reasonable to expect users to match the precision for floats when comparing to boto3's DynamoDB Type?

The test case could be:

    def test_serialize_float(self):
        for val in [0.999999999999, 0.1, 0.25, 
                    1.234567890123456789012345678901234567890]:
            self.assertEqual(
                self.serializer.serialize(val), {'N': format(val, '.38g')} )

Documentation for DynamoDB would explain this:

  **Note:**  In Python, the 'float' data type provides an approximation 
  of the computer hardware representation of binary fractions.  
  This issue is explained in the Python documentation section titled `Floating Point Arithmetic`_.
  Because of this, float values are converted to Decimal using create_decimal_from_float_ 
  and may be an inexact rounded representation of the value.
  Sterilization of float uses decimal precision set to 38 places.

.. _Floating Point Arithmetic: https://docs.python.org/3/tutorial/floatingpoint.html
.. _create_decimal_from_float: https://docs.python.org/3/library/decimal.html#decimal.Context.create_decimal_from_float

This would not break any current users (as existing applications get an error if float is used) and would make it easier for future users to drop data from other sources into DynamoDB, as explained in the referenced issue.

Thanks for your consideration!

andrewjroth avatar Dec 18 '20 15:12 andrewjroth

Thanks for the additional tests, @andrewjroth. Unfortunately we still end up with unexpected values like:

>>> val = 0.999999999999 # value we want in dynamodb
>>> self.serializer.serialize(val)
'0.99999999999900002212172012150404043496'
>>> {'N': format(val, '.38g')}
{'N': '0.99999999999900002212172012150404043496'}

The main issue here is many users of boto3 aren't familiar with the caveats of Python floats. The sharp edges don't surface with basic tests like the original one in this PR, resulting in code reaching production that'll often cause unexpected damage to datasets in DynamoDB. We've tried to do education campaigns in the past, but there will always be new users who pass floats without reading the docs.

This issue becomes especially prevalent as code bases start doing float division. You can produce a value that writes to DynamoDB that doesn't match the input and performing a get_item call will return a value that doesn't pass basic validation checks. There are also cases where you become unable to delete items out of DynamoDB because they don't match. This ended up being one of the most common stumbling points in Boto2 which lead us to dropping float support.

We definitely agree the current Decimal approach isn't pythonic or an ideal interface, but it's the only way we can ensure data consistency right now. We're looking for any opportunities we can to improve this in a future iteration of Boto3.

nateprewitt avatar Dec 18 '20 23:12 nateprewitt

I see the primary concern is a loss of precision by new users who may unknowingly expect better precision from float values than the data type is actually capable of. The problem with not being able to delete items only applies when a float value is used as an index key. Is this right?

For users who are aware of this limitation and are willing to accept it, would it be reasonable to add a flag for the DynamoDBHighLevelResource to allow floats? This would require the user to accept the risk of using floats when creating the resource interface.

I am thinking the user code would look something like this:

import boto3

dynamodb = boto3.resource('dynamodb', allow_floats=True)
table = dynamodb.Table('name')
response = table.put_item(
    Item={ "data1": { "data-level2": 0.938 } }
)

This might take a bit of work, so I wanted to ask if this was acceptable or not before starting on it. The default implementation would, of course, be reverted back to an error when floats are serialized.

Thanks for your feedback!

andrewjroth avatar Dec 21 '20 21:12 andrewjroth

Related to #369 and #665

jonapich avatar May 24 '22 09:05 jonapich

Hi @andrewjroth, thanks for creating this PR and your patience on hearing back. After bringing this up for further discussion with the team, we've decided to close this PR, as it implements functionality that requires a new major version and won't be considered for the current major version. This issue will continue to be tracked in https://github.com/boto/boto3/issues/665.

RyanFitzSimmonsAK avatar Sep 21 '23 19:09 RyanFitzSimmonsAK