MOE icon indicating copy to clipboard operation
MOE copied to clipboard

Colander to serialize/deserialize numpy arrays

Open suntzu86 opened this issue 10 years ago • 2 comments

Since moe expects numpy arrays as input and gives numpy arrays as output, it'd be nice if colander supported this directly.

Currently (as a hack), I (eliu) call numpy.array() on the output of params.get() (from deserialized colander operations) and I pass output.tolist() to the colander serializer. This means I import numpy in every file and we have numpy.array() and out.tolist() scattered throughout. It'd be nice to have all of these happen in 1 central place (via colander).

Colander supports type extensions, e.g.: http://colander.readthedocs.org/en/latest/extending.html#an-example So it looks like it could be as simple as providing a serialize that calls array.tolist() and then uses the "regular" serializer and a deserializer that uses the regular deserializer and calls numpy.array() on its output. There might be a more elegant way to do this too. I don't know much about colander so I'm not sure if this would work. And I can't find any examples of other people interfacing colander and numpy.

If we switch to jsonschema (or whatever), that new tool should also support this.

suntzu86 avatar Apr 18 '14 01:04 suntzu86

before assigning this as a newhire, let's flesh out a more concrete plan

suntzu86 avatar Jun 04 '14 01:06 suntzu86

As an addendum to this, we have a lot of colander schemas that contain the same information as various Python objects (like SamplePoint, HistoricalData, CovarianceInterface implementations, GaussianProcess, NewtonParameters, etc. all have corresponding colander schemas).

To be more DRY, it'd be nice to at least do:

  • have colander serialize using "to_json" type functions in each of these classes and have colander deserializing create the respective objects directly. This is a natural generalization of serializing/deserializing numpy arrays directly.

if not also

  • unify these representations so we don't have multiple definitions of the same stuff. like we could push validation into the object constructors so that colander schemas are just a loose wrapper around their respective objects (instead of redefining all the internal components).

suntzu86 avatar Jul 19 '14 01:07 suntzu86