Stream-Framework icon indicating copy to clipboard operation
Stream-Framework copied to clipboard

Escape semicolons

Open izhan opened this issue 9 years ago • 6 comments

I am currently running Stream-Framework with production data, and recently ran into an issue with how stream-framework serialize Aggregated Activities.

In aggregated_activity_serializer.py, we have the line check_reserved(serialized, [';', ';;']) that checks if the serialized values contain semicolons. My use case is creating notifications for when a user (actor) replies (verb) with a comment (object) to another comment (target). Due to the architecture of the current system, I currently include the body of the comment as extra_context to the activity. The comment body may contain semicolons as it is user-input, and a SerializationException is thrown upon serialization.

This PR simply escapes semicolons before deserialization, allowing for semicolons in activities. I've added semicolons in the extra_context of a test activity to verify that it is correctly escaping the semicolons.

izhan avatar Jul 22 '15 00:07 izhan

We've just hit this problem as well, is there any reason this can't be merged in?

joealcorn avatar Sep 16 '15 10:09 joealcorn

this PR will break existing applications or make it impossible to upgrade without purging/updating all data in Redis.

tbarbugli avatar Sep 16 '15 15:09 tbarbugli

indeed, it's an unlikely edge case, but this will break for any existing activities that serializes into a string that ends with a backslash.

izhan avatar Sep 16 '15 16:09 izhan

Is there any updates? We've hit this problem as well. It stops us from using python3 with this awesome project.

The reason in default pickle version. By default python3 use version 3. But it's incompatible with existed data (at the moment we use python2 😄 ).

In python3

>>> import pickle 
>>> pickle.dumps({'user_id': 12313, 'meta_id':19003}).decode('latin1') 
'\x80\x03}q\x00(X\x07\x00\x00\x00user_idq\x01M\x190X\x07\x00\x00\x00meta_idq\x02M;Ju.'
>>> pickle.dumps({'user_id': 12313, 'meta_id':19003}, 0).decode('latin1') 
'(dp0\nVuser_id\np1\nL12313L\nsVmeta_id\np2\nL19003L\ns.'

In python2

>>> import pickle 
>>> pickle.dumps({'user_id': 12313, 'meta_id':19003})
"(dp0\nS'user_id'\np1\nI12313\nsS'meta_id'\np2\nI19003\ns."

EvgeneOskin avatar Apr 28 '17 04:04 EvgeneOskin

@EvgeneOskin how is this PR related to pickle output differences between Python 2 and Python 3?

tbarbugli avatar Apr 28 '17 09:04 tbarbugli

As far as I understand this pull request fix a fail when serialized string contains semicolon.

Also, I've solved my issue with custom ActivitySerializer and AggregatedSerializer.

class Base64ActivitySerializer(ActivitySerializer):

    def dumps(self, activity):
        serialized = super(Base64ActivitySerializer, self).dumps(activity)
        return base64.b64encode(serialized.encode('latin1')).decode('latin1')

    def loads(self, serialized):
        data = base64.b64decode(serialized.encode('latin1')).decode('latin1')
        return super(Base64ActivitySerializer, self).loads(data)


class FollowingSerializer(NotificationSerializer):

    activity_serializer_class = Base64ActivitySerializer

To me it would be nice to use json to serialize extra_context.

EvgeneOskin avatar Apr 28 '17 10:04 EvgeneOskin