python-labthings icon indicating copy to clipboard operation
python-labthings copied to clipboard

How to handle common, non-primitive data types

Open jtc42 opened this issue 4 years ago • 2 comments

This is likely going to be an open question for a while, but there are my current thoughts. All input is welcome.

I feel like, by and large, data collected from lab instruments can sensibly be converted to primitive data types. The most common types I have in mind are Numpy arrays, and Pandas data frames. Both of these can be represented easily with primitive data types.

There are however cases where data will be collected that cannot be converted to a primitive type.

In the new cbor branch, I've added a section to the JSON encoder that will base64 encode bytes Python objects. I've correspondingly included a Marshmallow Bytes field to handle validating binary data in this format. It populates the documentation with information about the string values being a base64 encoded block of binary data. Everything is fine on that front.

However, as @rwb27 has mentioned in the past, sometimes the binary data collected will be big enough that the b64 encoding overhead could become problematic. To handle these cases, I've included support for clients to accept application/cbor responses instead of application/json.

CBOR has built in support for binary encoded data, so if a client requests a CBOR response, no encoding overhead is introduced. The data gets passed directly to the CBOR response, otherwise identical to the JSON response, but with the binary section unencoded.

This solution isn't perfect though. The Thing Description is required to be JSON. This is fine in most cases as it accurately describes the base64 encoded binary blobs. However, it means that the CBOR response will deviate from the Thing Description, receiving a bytes type value where the Description says a string will be returned.

I currently feel however that the cases where large, non-primitive data files are being collected with such high frequency that CBOR encoding is required are infrequent enough that, given proper documentation, this solution could still be fine.

Again, thoughts are welcome.

Note: The CBOR branch is useful even aside from this. It's a much more compact data format that JSON, so for many cases it may be beneficial to actually communicate over BSON even without needing to transfer bytes objects. It was easy to add support, and doesn't affect the JSON functionality at all.

jtc42 avatar May 02 '20 18:05 jtc42