Stream-Framework
Stream-Framework copied to clipboard
Escape semicolons
I am currently running Stream-Framework with production data, and recently ran into an issue with how stream-framework serialize Aggregated Activities.
In aggregated_activity_serializer.py, we have the line check_reserved(serialized, [';', ';;']) that checks if the serialized values contain semicolons. My use case is creating notifications for when a user (actor) replies (verb) with a comment (object) to another comment (target). Due to the architecture of the current system, I currently include the body of the comment as extra_context to the activity. The comment body may contain semicolons as it is user-input, and a SerializationException is thrown upon serialization.
This PR simply escapes semicolons before deserialization, allowing for semicolons in activities. I've added semicolons in the extra_context
of a test activity to verify that it is correctly escaping the semicolons.
We've just hit this problem as well, is there any reason this can't be merged in?
this PR will break existing applications or make it impossible to upgrade without purging/updating all data in Redis.
indeed, it's an unlikely edge case, but this will break for any existing activities that serializes into a string that ends with a backslash.
Is there any updates? We've hit this problem as well. It stops us from using python3 with this awesome project.
The reason in default pickle version. By default python3 use version 3. But it's incompatible with existed data (at the moment we use python2 😄 ).
In python3
>>> import pickle
>>> pickle.dumps({'user_id': 12313, 'meta_id':19003}).decode('latin1')
'\x80\x03}q\x00(X\x07\x00\x00\x00user_idq\x01M\x190X\x07\x00\x00\x00meta_idq\x02M;Ju.'
>>> pickle.dumps({'user_id': 12313, 'meta_id':19003}, 0).decode('latin1')
'(dp0\nVuser_id\np1\nL12313L\nsVmeta_id\np2\nL19003L\ns.'
In python2
>>> import pickle
>>> pickle.dumps({'user_id': 12313, 'meta_id':19003})
"(dp0\nS'user_id'\np1\nI12313\nsS'meta_id'\np2\nI19003\ns."
@EvgeneOskin how is this PR related to pickle output differences between Python 2 and Python 3?
As far as I understand this pull request fix a fail when serialized string contains semicolon.
Also, I've solved my issue with custom ActivitySerializer
and AggregatedSerializer
.
class Base64ActivitySerializer(ActivitySerializer):
def dumps(self, activity):
serialized = super(Base64ActivitySerializer, self).dumps(activity)
return base64.b64encode(serialized.encode('latin1')).decode('latin1')
def loads(self, serialized):
data = base64.b64decode(serialized.encode('latin1')).decode('latin1')
return super(Base64ActivitySerializer, self).loads(data)
class FollowingSerializer(NotificationSerializer):
activity_serializer_class = Base64ActivitySerializer
To me it would be nice to use json
to serialize extra_context
.