declarativewidgets icon indicating copy to clipboard operation
declarativewidgets copied to clipboard

Need a more comprehensive serialization solution for Python

Open lbustelo opened this issue 8 years ago • 3 comments

@jtyberg Said in #138

Users can still hit serialization errors rather easily though. Just try to pass a dict object with a date object as a value to a widget and boom. So I added a rather comprehensive example to the urth-core-function sample notebook to document ways to serialize Python dates, numpy types, etc.

I originally thought about adding serializer classes for these types into the widget code base, but the widget serializer code does not recurse into objects. So if the widget code had a serializer class that serializes date objects (or the user writes one), it will never get called on the date object that is an item in a dict that gets returned to a widget from a function. In addition, the class-based serialization does not fit Python style, where most functions return lists, tuples, dicts, etc., rather than custom classes. So it seems unnatural for a user to write a serializer class to serialize the items in a dict or tuple, for example. I'd rather just write a single function that handles all the types I know I'll be dealing with, as I did in the urth-core-function notebook example.

Inevitably, notebook users will hit a limit where they have to write code to serialize objects, but I wonder if there's more we should do to push this limit further out, such as recursing into basic python iterables, or providing some default serialization behavior that can be more easily overridden.

I can see 2 options

Option 1

Stay the course and have serialization helpers in our widgets and fix the issue of serializing nested structures. I think that there has to be a Python package that would help with this similar to how in Scala we are relying on Play.Json to do the work. This option would also benefit from an enhancement to ipywidgets to accept pre-serialized data, because ultimately certain cases would return a stringyfied JSON blob and there is no way to send that to ipywidgets without turning it back into a Object.

Right now we are ultimately relying on ipywidgets to turn our message into a string.

def _send_update(self, attribute, value):
        """
        Sends a message to update the front-end state of the given attribute.
        """
        msg = {
            "method": "update",
            "state": {
                attribute: value
            }
        }
        self._send(msg)

The Widget._send() method will try to serialize the msg object. If it encounters anything that it cannot serialize, it fails. In declarativewidgets, we try to turn objects into Serializable ones, but ultimately the serialization is done by ipywidgets.

It we do the Pandas.DataFrame.to_json(), which gives us a string, there is not way to tell _send() to unwrap that string and place it in the message. We are force to load that json string back into a Dict to give back to _send().

Option 2

Remove serialization concerns from our widgets and rely on comprehensive/extensible support for serialization to be provided by ipywidgets. ipywidgets ultimately takes the message and sends it through the wire. This would mean that widgets like kernel-python/urth/widgets/widget_function.py would not call any serialization helper, but rather send the returned object from the function straight into the messages to git to Widget._send().

At the end, the key behavior to retain is that we do not want the client side to have to differentiate between an actual string value and one that is really an Object but is stringyfied.

I'm leaning towards Option 2 because that would put serialization concerns closer to where the data is sent through the wire.

/cc @jdfreder @SylvainCorlay

lbustelo avatar Dec 09 '15 17:12 lbustelo

I'm leaning towards Option 2 because that would put serialization concerns closer to where the data is sent through the wire.

Also, I'm sure we don't want other, future libraries that build atop ipywidgets to have to fix the same problem and duplicate everything done in declarative widgets. This becomes more of a concern now that ipywidgets is becoming less strictly coupled with the notebook and can be used in other contexts.

parente avatar Dec 09 '15 17:12 parente

I think the problem here is the API. As a Python user, the Python json.dumps API is what I would want. The function will serialize nested objects just fine, but when it encounters an object it can't handle, it raises. In such cases, the user can provide either a default function or a class to handle any object that dumps doesn't know how to handle. Gimme that.

Unfortunately, the serialization of objects going over the comm channel is triggered deep in the bowels of Jupyter. Not in ipywidgets, not even in ipykernel, but in jupyter_client/session.py. This is far, far away from the widgets user, who ultimately has to provide the alternate serialization mechanism, because it's not feasible for the Jupyter internals to handle every case without throwing up its hands and invoking repr on the object.

jtyberg avatar Dec 10 '15 14:12 jtyberg

I wonder if the fix for this would be to create a custom traitlet that is used to hold a value and the traitlet implements the repr and delegated down to contained object.

For example in kernel-python/urth/widgets/widget_function.py we would have the return value of the function be held as a traitlet instance like

result = ResultTraitlet(None, sync=True)

When the function is invoke, we just update the result traitlet and that will trigger the update message back to the client. When the ResultTraitlet reaches the point of serialization, it will have repr and would delegate to underlying object. At which point, we are returning json string so we can call things like Pandas.to_json and return directly. ResultTraitlet will then incorporate the serialization support.

The same traitlet should be reusable for channel().set() api.

lbustelo avatar Dec 16 '15 18:12 lbustelo