blocks icon indicating copy to clipboard operation
blocks copied to clipboard

Shareable pre-trained models

Open dwf opened this issue 8 years ago • 12 comments

Other frameworks, notably Caffe and Torch, support loading in serialized models that other people have trained, which facilitates wider sharing and reuse of results. That Blocks doesn't really have any way of doing this is a notable weakness.

Pickle isn't really the right kind of serialization format for this, I feel. NPY files work for parameters but it'd be nice to have some sort of nice way of having a less Python-specific and less brittle serialization format for ComputationGraphs and any Bricks that were used to build them.

This is a truly ambitious ticket but I figure this ticket can centralize discussion at whatever pace it's going to happen.

dwf avatar Jul 08 '15 18:07 dwf

Pickle isn't really the right kind of serialization format for this, I feel.

Can I ask you to elaborate a bit?

rizar avatar Jul 09 '15 08:07 rizar

For serializing entire models (including Python classes), a pickled model is tied to the particular version of the code used to generate it, unless the library is willing to put a lot of efforts in keeping the pickled models compatible. For serializing only the parameters, pickling of arrays is inefficient (especially in terms of memory use) compared to npy (or npz).

If we want to serialize model parameters in a structured way, that would be more easily readable from outside Python, I would suggest HDF5. Of course, someone wanting to write a loader for another library would have to dig through Blocks code (and the right revision...) to understand what are the operation performed, how to use the parameters exactly, and so on. If we want to serialize ComputationGraphs and Bricks themselves, I don't think we can find something that is not Python-specific.

lamblin avatar Jul 09 '15 15:07 lamblin

I would definitely like to see this -- quite a noticeable gap at the moment. On a related point, how do I make predictions from a pre-trained Model? Right now the library only seems to help with training and evaluation.

Suggestion: what if you give up on making this code-version independent, but just put some convenience methods on Brick to allow for persisting its parameters (as npz or HDF5), and initializing its parameter values from disk, and make it so the user can override them easily to provide for some level of backwards compatibility? This way I can easily persist my top-level Brick. Sure I have to worry about breaking compatibility if I change the parameters or the behaviour of that brick, but I'm in control and there's relatively little magic vs relying on pickle.

Or perhaps there could be a model-like class that wraps a top-level brick (or any other theano expression), enables this kind of save/load functionality and behaves a bit more like you'd expect a model to behave (an API to make predictions for one), without coupling it to the cost function used to train it?

mjwillson avatar Jul 20 '15 18:07 mjwillson

The philosophy of Blocks is to stay pretty close to being an object-oriented layer on top of Theano that ties together parameters with methods that generate symbolic expressions involving them. If one wants to "make predictions" with a model, Blocks doesn't presume to know what that means -- It's up to the user to construct a Theano expression representing what she wants to evaluate.

I do like the idea of a HDF5-backed persistence layer for models, where brick names and the brick hierarchy are used to define HDF5 groups for storing parameters grouped together, with requisite structure information stored as HDF5 metadata attributes.

dwf avatar Jul 20 '15 19:07 dwf

OK, fair enough -- that's useful to know.

Maybe worth showing an example of what the recommended approach is to creating some kind of model object with a prediction interface that can be persisted and which encapsulates the theano internals, though. I imagine this is a pretty common requirement for anyone that wants to ship the models that they train. I can roll my own, sure, but I'd tend to look to the framework I use to train models at least for hints or conventions about how to do this. For example I initially expected the Model class to be the place for this, but it turned out to be slightly misleadingly-named -- more just an adapter object that MainLoop requires you to construct for it in order to wrap up a cost function for training.

What you mention re HDF5-based persistence sounds great, anyway!

mjwillson avatar Jul 21 '15 11:07 mjwillson

Bumping this ticket since this is only going to get more painful as people use models with Python 3 and Python 2. Was trying to share a model with @ddtm and I had to resort to pickling a dict of selector-names -> ndarrays.

dwf avatar May 04 '16 01:05 dwf

Ran across a good model for what we might try to achieve here: https://camel.readthedocs.io/en/latest/camel.html

A lot of the logic would be identical for all Brick classes, and if you've got a custom one, well, you write down the extra knowledge of how to serialize relevant state and deserialize it into a working brick, or you don't use the tool-agnostic/portable serialization code.

dwf avatar Jun 22 '16 23:06 dwf

How much code do we need to add? Looks like for every brick, main loop objects, logs, even numpy objects we need to write dumper and loader. It's quite a bit of work.

Additionally, it might be not obvious to write serializers for theano objects like RNGs.

dmitriy-serdyuk avatar Jun 23 '16 18:06 dmitriy-serdyuk

Oh, the point of this ticket is not to replace the checkpointing system we have. That is basically fine for the purpose it serves.

The point of this ticket is to just be able to save models (at least, hierarchies of bricks) in a format where the structure, as well as the parameters, is interpretable across Python versions, across programming platforms (so someone could write a Torch loader or something), in a way that is suitable for long term storage.

dwf avatar Jun 29 '16 04:06 dwf

I see, in this case Camel looks like a very good solution, I like its interface.

dmitriy-serdyuk avatar Jun 29 '16 19:06 dmitriy-serdyuk

I don't think it meets all of our needs, in that it would need to be extended to efficiently store big binary blobs (the parameters) but it is certainly a good place to start.

On Wed, Jun 29, 2016 at 3:21 PM, dmitriy-serdyuk [email protected] wrote:

I see, in this case Camel looks like a very good solution, I like its interface.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mila-udem/blocks/issues/754#issuecomment-229460683, or mute the thread https://github.com/notifications/unsubscribe/AADrLnq8Y0YvzpnY5q4rdQO1z7uyrRRTks5qQsXBgaJpZM4FUq95 .

dwf avatar Jun 29 '16 19:06 dwf

This case doesn't need so much efficiency, we can afford several seconds of loading. The easiest solution is to store the parameters as base64 npy serialized strings inside yaml. It should be portable and more or less easy to load from both python and Torch.

dmitriy-serdyuk avatar Jun 29 '16 21:06 dmitriy-serdyuk