PynamoDB
PynamoDB copied to clipboard
Single Corrupt Record Breaks Scan
We currently have an issue that our current API doesn't support the same number of fields that we support within our DynamoDB records. So we have a lot of employees manually editing/creating records within the DynamoDB Tables by hand.
Occasionally they forget to add required fields, or enter the wrong data for a field. These employees are in a separate department so sometimes they're not aware of new fields added recently.
Some of our tables only contain a few hundred records, and our production systems routinely scan the full table, using PynamoDB. This means that a single corrupt record (i.e. missing a field) raises an exception and we cannot do any work required on the remaining valid records.
Is there a way in PynamoDB to defer deserialization until later on, i.e. allow the scan to happen, but deserialize each record individually so that we can catch the exception and ignore the problem records?
Would really appreciate some help on this, we have some scope to help improve PynamoDB in this scenario if there currently isn't a way to support/skip over problem records currently.
Hi, I don't know if it applies to your use case but there is a possibility to override the default behavior of the PynamoDB model to accept None for specific attribute (by setting up "null=True"): https://pynamodb.readthedocs.io/en/latest/tutorial.html?highlight=null#defining-model-attributes You can then iterate over gathered records and find out which are corrupted.
Thanks for your suggestion @mateuszciosek but unfortunately it isn't just missing fields that is the problem.
We have more complex Map fields that aren't formatted correctly when edited manually in the aws console.
Trying to make the pynamoDB model less strict isn't really what we're after. More just letting us skip over the dodgy records.
Thanks again for your help though.
That's a great question. Perhaps we can add a try: bool = False
parameter that'll cause the iteration to be over Union[Model, DeserializationFailure]
where DeserializationFailure
could include:
-
raw_data: Dict[str, Any]
-
deserialization_exception: Exception
Another thing I was thinking about, is to allow models to be defined as lazily deserializable, mostly for deserialization performance of non-corrupt models, but it might help here too.
@ikonst that sounds like a good idea, then in the loop I can just check what type the iterated item is.
Was wondering for a short term fix if I could get away with overriding from_raw_data
with my model class. And then put a try ... except around the call to the super().from_raw_data(...)
Yeah, that might work. Make sure you keep some way in your model to indicate that it wasn't initialized.