clearml
clearml copied to clipboard
Upload artifacts which contain lambdas
As I understand, the Task.upload_artifact
function in ClearML is limited by what Python's pickle.dump
can do. As of now pickle.dump
cannot deal with objects that contain lambda
functions.
Is there a way to upload these objects without resorting to storing the object on disk with e.g. dill and then uploading the file?
The default fallback is pickle, because dill (which is a great package) is not a built in package. Relying on "dill" for serialization means we will always need to add it to the requirements. You can of course manually dill the object and use it. We did add the ability to add custom serialization function, maybe we should extend it to support dill
We did add the ability to add custom serialization function, maybe we should extend it to support dill
There is no way to specify the custom serialization as a user of ClearML e.g. when instantiating a new task?
Hi @schiegl! Thank you for opening this issue! At the moment, we don't support custom serialization, but it is a feature we would like to implement. There are 2 ways we could go about this (maybe implement both):
- Set default custom serialization/deserialization functions that would replace the
pickle
functions at task level. Something like:task.set_default_serialization(function)
andtask.set_default_deserialization(function)
- Set the serialization functions at artifact upload/download level:
task.upload_artifact(name, artifact_object, serialization=function)
andtask.artifacts[name].get(deserialization=function)
What do you think?
In my case both would work fine. If you do it at the task level there might be some kind of matching (e.g. with is instance
) necessary on the users side.
Hi @schiegl FYI I am creating a feature request. Thanks for your contribution
@DavidNativ Hello, I'm also interested in this feature. What's the status of it?
Hi @schiegl and @gbartyzel ,
My apologies for not answering earlier, install the latest clearml version and in the upload_artifact() method you'll find a possibility to specify the serialization_function
@erezalg Well, but I'm currently playing with pipelines and I want to pass Pytorch model from one step to another. It's not a pure Pytorch model, it's customized by our framework. Is this option also available in the pipeline decorator?
@gbartyzel, Actually no, we somehow missed that :) I'll make sure it's out in the next version that's supposed to be released in a week or two.
Oh that's great! Maybe it would be better to consider switching from pickle to dill?
@gbartyzel you mean to replace the default serializer from pickle to dill? Or specifically in an experiment \ pipeline?
@erezalg I mean by replacing the default pickle serializer. As I found, dill is more advanced then default pickle (lambda serializing etc.). But well, it's just a suggestion ;)
@gbartyzel We thought the same thing, but as much as we love dill (and we do :) ), it's yet another dependency that users might not use (As far as we know, it's less commonly used than pickle) and adding dependencies that our users might not already have is something we try to avoid as we found out it's not well accepted.
Anyway, we've tested this with dill and it works, as soon as we can get you a nice interface to use on pipelines
@erezalg Ok, a solid argument. So it would be awesome to choose manually the serializer engine.
@gbartyzel That is the plan :) Will update here once it's out!
Hey @schiegl and @gbartyzel! v1.10.0 is now out, supporting custom artifact serialization for pipelines
@schiegl, @gbartyzel, closing this, please reopen if required.