PynamoDB icon indicating copy to clipboard operation
PynamoDB copied to clipboard

Binary Attributes Double-base64 Encoded

Open ewalker11 opened this issue 9 years ago • 13 comments

The botocore client, called via _convert_to_request_dict, is performing another round of base64 encoding on binary fields. This is on top of the existing round of encoding done by the BinaryAttribute's serialize() method.

This is resulting in double base64 encoded values stored in Dynamo, which is why we have to undo one of the decodes via _convert_binary.

IOW, all binary data that has ever been persisted using the BinaryAttribute in any version of Pynamo is sitting double-base64-encoded in Dynamo somewhere.

ewalker11 avatar May 12 '16 02:05 ewalker11

There seems to be an interesting edge case: Using a BinaryAttribute as range key is causing it to get converted into a string attribute in DynamoDB, thus getting stored without the doubled bas64-encoding.

Dunedan avatar Feb 23 '18 10:02 Dunedan

With version 4.2.0 this issue persists. Is this planned to be fixed or will this remain for compatibility?

pleroux0 avatar Nov 20 '19 06:11 pleroux0

I'm not aware of a "plan" for the next release.

@garrettheel Given we have this issue, do you see any reason not to move the existing behavior to a "legacy" attribute class for PynamoDB 5.x? (assuming someone will contribute this)

ikonst avatar Nov 20 '19 16:11 ikonst

One easy (though probably dangerous) fix is to use a custom attribute to encode these values just as bytes:

class PickleAttribute(BinaryAttribute):
    def serialize(self, value: Any) -> Any:
        return pickle.dumps(value)

    def deserialize(self, value: Any) -> Any:
        return pickle.loads(value)

I think I am relying on botocore to base64 encode bytes, which could blow up if that behavior changes, but works for me for now.

iwsmith avatar Aug 18 '20 21:08 iwsmith

Whoops, sorry for missing this one. I'm a little wary of changing this in such a way that it would blow up without folks doing a data migration. My preference would be to allow folks to opt-in to the fix and mark the existing behavior as legacy (possibly with a deprecation warning).

We could possibly take an argument on attribute instantiation to control this, or create an entirely new attribute. My worry with renaming the existing attr to LegacyBinaryAttribute and reclaiming BinaryAttribute is that many wont read the release notes properly and this will break in a non-obvious way (and when it hits production).

garrettheel avatar Aug 18 '20 22:08 garrettheel

What about creating a BinaryAttributeV2, leaving BinaryAttribute alone so existing users don't get broken by an upgrade, but mark it as deprecated and update the documentation to point people at BinaryAttributeV2?

iwsmith avatar Aug 19 '20 17:08 iwsmith

Are there any plans to do something here? I want to compress a string and save it, but much of the compression is lost by base64 double-encoding...

MrCrumbs avatar Sep 18 '23 10:09 MrCrumbs

This is addressed in the 6.0 branch (which I should really hurry up and release at this point...).

ikonst avatar Sep 18 '23 12:09 ikonst

Awesome! Waiting for the release, thanks!

MrCrumbs avatar Sep 18 '23 12:09 MrCrumbs

Hi @ikonst , is there any plan to release 6.0 to fix double encoding issue?

Chouvic avatar Nov 13 '23 12:11 Chouvic

Yes, might try to get it done this coming long weekend

ikonst avatar Nov 20 '23 17:11 ikonst

Hey @ikonst, I see the new release doesn't contain the fix with legacy_encoding, right? Am I missing something?

MrCrumbs avatar Nov 29 '23 08:11 MrCrumbs

It's in the master branch. I've made that release off the 5.x branch.

ikonst avatar Nov 29 '23 17:11 ikonst