PynamoDB icon indicating copy to clipboard operation
PynamoDB copied to clipboard

Getting a ValueError when deserializing a valid UTCTimeDateAttribute

Open ph3ne opened this issue 3 years ago • 9 comments

Hello,

I'm getting stuck with a ValueError when i use the get() method on the following model:

class Job(Model):
    class Meta:
        table_name = environ.get('TABLE_NAME')
        aws_access_key_id = environ.get('AWS_ACCESS_KEY_ID')
        aws_secret_access_key = environ.get('AWS_SECRET_ACCESS_KEY')
        region = 'eu-west-3'

    id = NumberAttribute(hash_key=True)
    name = UnicodeAttribute()
    tenant = UnicodeAttribute()
    query = UnicodeAttribute()
    schedule = UnicodeAttribute()
    active = BooleanAttribute()
    created_by = UnicodeAttribute()
    modified_by = UnicodeAttribute()
    creation_date = UTCDateTimeAttribute()
    modification_date = UTCDateTimeAttribute()

the item I'm trying to retrieve has the following values:

{
  "id": {
    "N": "1"
  },
  "tenant": {
    "S": "TRN"
  },
  "active": {
    "BOOL": true
  },
  "schedule": {
    "S": "every monday"
  },
  "query": {
    "S": "select * from *"
  },
  "created_by": {
    "S": "Lucas"
  },
  "modification_date": {
    "S": "2021-09-06 17:04:21.899683"
  },
  "modified_by": {
    "S": "Lucas"
  },
  "name": {
    "S": "Test Job 1"
  },
  "creation_date": {
    "S": "2021-09-06 17:04:21.899683"
  }
}

But i get the following stack trace:

 File "/mnt/c/Users/l.pierru/Documents/Focal Microservices/infor-data-lake-to-s3/test_scan.py", line 55, in <module>
    job = Job.get(2)
  File "/home/lpierru/.local/share/virtualenvs/infor-data-lake-to-s3-_H1zLWyn/lib/python3.9/site-packages/pynamodb/models.py", line 542, in get
    return cls.from_raw_data(item_data)
  File "/home/lpierru/.local/share/virtualenvs/infor-data-lake-to-s3-_H1zLWyn/lib/python3.9/site-packages/pynamodb/models.py", line 556, in from_raw_data
    return cls._instantiate(data)
  File "/home/lpierru/.local/share/virtualenvs/infor-data-lake-to-s3-_H1zLWyn/lib/python3.9/site-packages/pynamodb/attributes.py", line 400, in _instantiate
    AttributeContainer._container_deserialize(instance, attribute_values)
  File "/home/lpierru/.local/share/virtualenvs/infor-data-lake-to-s3-_H1zLWyn/lib/python3.9/site-packages/pynamodb/attributes.py", line 380, in _container_deserialize
    value = attr.deserialize(attr.get_value(attribute_value))
  File "/home/lpierru/.local/share/virtualenvs/infor-data-lake-to-s3-_H1zLWyn/lib/python3.9/site-packages/pynamodb/attributes.py", line 697, in deserialize
    return self._fast_parse_utc_date_string(value)
  File "/home/lpierru/.local/share/virtualenvs/infor-data-lake-to-s3-_H1zLWyn/lib/python3.9/site-packages/pynamodb/attributes.py", line 717, in _fast_parse_utc_date_string
    raise ValueError("Datetime string '{}' does not match format '{}'".format(date_string, DATETIME_FORMAT))
ValueError: Datetime string '000002021-09-06 17:31:31.429277' does not match format '%Y-%m-%dT%H:%M:%S.%f%z`

I already tried to create the item multiple times, checking for hidden characters and such. I don't know where the five zeroes before the year come from. I can't see those when using the AWS CLI or the website editor so I think it's coming from the pynamodb module.

Any help on this would be appreciated.

ph3ne avatar Sep 06 '21 15:09 ph3ne

It doesn't have T in the middle nor a timezone. Not sure what encoded this value but perhaps it's best if you just map it up UnicodeAttribute?

ikonst avatar Sep 07 '21 00:09 ikonst

Well spotted! I didn't see the T was missing in the format :)

The issue is still the same though, updated Traceback:

Traceback (most recent call last):
  File "/mnt/c/Users/l.pierru/Documents/Focal Microservices/infor-data-lake-to-s3/test_scan.py", line 41, in <module>
    for job in Job.batch_get(ids):
  File "/home/lpierru/.local/share/virtualenvs/infor-data-lake-to-s3-_H1zLWyn/lib/python3.9/site-packages/pynamodb/models.py", line 367, in batch_get
    yield cls.from_raw_data(batch_item)
  File "/home/lpierru/.local/share/virtualenvs/infor-data-lake-to-s3-_H1zLWyn/lib/python3.9/site-packages/pynamodb/models.py", line 556, in from_raw_data
    return cls._instantiate(data)
  File "/home/lpierru/.local/share/virtualenvs/infor-data-lake-to-s3-_H1zLWyn/lib/python3.9/site-packages/pynamodb/attributes.py", line 400, in _instantiate
    AttributeContainer._container_deserialize(instance, attribute_values)
  File "/home/lpierru/.local/share/virtualenvs/infor-data-lake-to-s3-_H1zLWyn/lib/python3.9/site-packages/pynamodb/attributes.py", line 380, in _container_deserialize
    value = attr.deserialize(attr.get_value(attribute_value))
  File "/home/lpierru/.local/share/virtualenvs/infor-data-lake-to-s3-_H1zLWyn/lib/python3.9/site-packages/pynamodb/attributes.py", line 697, in deserialize
    return self._fast_parse_utc_date_string(value)
  File "/home/lpierru/.local/share/virtualenvs/infor-data-lake-to-s3-_H1zLWyn/lib/python3.9/site-packages/pynamodb/attributes.py", line 717, in _fast_parse_utc_date_string
    raise ValueError("Datetime string '{}' does not match format '{}'".format(date_string, DATETIME_FORMAT))
ValueError: Datetime string '000002021-09-07T10:04:58.728100' does not match format '%Y-%m-%dT%H:%M:%S.%f%z'

ph3ne avatar Sep 07 '21 08:09 ph3ne

Still no timezone. Also year is padded with zeroes. What are your serializing this with? Why do you want to serialize it with UTCDateTimeAttribute? You can implement your own attribute too to parse whatever format you serialized.

ikonst avatar Sep 07 '21 19:09 ikonst

Getting the same error with UTCDateTimeAttribute:

ValueError: Datetime string '00000002022-02-26T18:21:33.034Z' does not match format '%Y-%m-%dT%H:%M:%S.%f%z'

The DynamoDB table has these values stored as strings in created_at_utc field:

2022-02-26T18:21:33.034Z

The formatting generated by the Python script is exactly the same as the modified_at_utc column which contains a timestamp generated by Step Functions with $$.Execution.StartTime:

# $$.Execution.StartTime timestamp
2022-02-26T18:21:57.318Z

For some reason the extra zeros are being added.

chrisammon3000 avatar Feb 26 '22 18:02 chrisammon3000

I just had this issue as well while migrating similar items into an existing pynamodb model. Not an ideal fix but I was able to do the following to get around it.

import dateutil.parser
import json

# item is a dict representing keys and values that match up to the attributes in my pynamodb model
new_timestamp = dateutil.parser.parse(item['timestamp'])
item.pop('timestamp')

# this line used to error because of the timestamp
# now that I've popped timestamp off it doesn't
thing = Thing(**item)
# then I can put timestamp back on after from_json runs without error
thing.timestamp = new_timestamp
thing.save()

peoplespete avatar Apr 28 '22 15:04 peoplespete

I have similar problem:

ValueError: Datetime string '2022-05-09T12:04:56.479663+00:00' does not match format '%Y-%m-%dT%H:%M:%S.%f%z'

So I'm getting this error with a valid ISO 8601 format 🤯

umatbro avatar May 09 '22 12:05 umatbro

I am getting the same when using from_raw_data it adds some extra zeros when convert the date to pynamodb's model

WaGjUb avatar Oct 14 '22 10:10 WaGjUb

In each case it seems that the data has been populated in DynamoDB through some other means than PynamoDB's UTCTimeDateAttribute. We probably don't want to make a flexible parser to handle what's effectively a data corruption.

You might try to override deserialize on your model and once all data is converted to the "good" state, remove the override from your code:

def deserialize(self, values: Any) -> Any:
    if values is not None:
        if has_bad_format(values['my_time_attr']['S']):
            values['my_time_attr']['S'] = convert_bad_to_good(values['my_time_attr']['S'])
    return super().deserialize(values)

ikonst avatar Oct 15 '22 22:10 ikonst

I'm just trying out pynamodb but this immediately caught me offguard, as I use a 2021-10-07T17:01:04.016Z type format for all my datetime fields. I ended up writing my own custom attribute, which was surprisingly easy to do. It's not the most efficient, but works well for me:

from datetime import datetime, timezone
from dateutil.parser import parse

class AWSISODateTimeAttribute(Attribute[datetime]):
    attr_type = STRING

    def serialize(self, value: Union[datetime,str]) -> str:
        t: datetime = None
        if isinstance(value, str):
            t = parse(value)
        else:
            t = value
        if t.tzinfo is None:
            t = t.replace(tzinfo=timezone.utc)
        return t.astimezone(timezone.utc).isoformat(timespec='milliseconds').replace("+00:00", "Z")

    def deserialize(self, value: str) -> str:
        return value

half2me avatar Dec 21 '22 22:12 half2me