Binary Attributes Double-base64 Encoded
The botocore client, called via _convert_to_request_dict, is performing another round of base64 encoding on binary fields. This is on top of the existing round of encoding done by the BinaryAttribute's serialize() method.
This is resulting in double base64 encoded values stored in Dynamo, which is why we have to undo one of the decodes via _convert_binary.
IOW, all binary data that has ever been persisted using the BinaryAttribute in any version of Pynamo is sitting double-base64-encoded in Dynamo somewhere.
There seems to be an interesting edge case: Using a BinaryAttribute as range key is causing it to get converted into a string attribute in DynamoDB, thus getting stored without the doubled bas64-encoding.
With version 4.2.0 this issue persists. Is this planned to be fixed or will this remain for compatibility?
I'm not aware of a "plan" for the next release.
@garrettheel Given we have this issue, do you see any reason not to move the existing behavior to a "legacy" attribute class for PynamoDB 5.x? (assuming someone will contribute this)
One easy (though probably dangerous) fix is to use a custom attribute to encode these values just as bytes:
class PickleAttribute(BinaryAttribute):
def serialize(self, value: Any) -> Any:
return pickle.dumps(value)
def deserialize(self, value: Any) -> Any:
return pickle.loads(value)
I think I am relying on botocore to base64 encode bytes, which could blow up if that behavior changes, but works for me for now.
Whoops, sorry for missing this one. I'm a little wary of changing this in such a way that it would blow up without folks doing a data migration. My preference would be to allow folks to opt-in to the fix and mark the existing behavior as legacy (possibly with a deprecation warning).
We could possibly take an argument on attribute instantiation to control this, or create an entirely new attribute. My worry with renaming the existing attr to LegacyBinaryAttribute and reclaiming BinaryAttribute is that many wont read the release notes properly and this will break in a non-obvious way (and when it hits production).
What about creating a BinaryAttributeV2, leaving BinaryAttribute alone so existing users don't get broken by an upgrade, but mark it as deprecated and update the documentation to point people at BinaryAttributeV2?
Are there any plans to do something here? I want to compress a string and save it, but much of the compression is lost by base64 double-encoding...
This is addressed in the 6.0 branch (which I should really hurry up and release at this point...).
Awesome! Waiting for the release, thanks!
Hi @ikonst , is there any plan to release 6.0 to fix double encoding issue?
Yes, might try to get it done this coming long weekend
Hey @ikonst, I see the new release doesn't contain the fix with legacy_encoding, right? Am I missing something?
It's in the master branch. I've made that release off the 5.x branch.