clearml icon indicating copy to clipboard operation
clearml copied to clipboard

Upload artifacts which contain lambdas

Open schiegl opened this issue 2 years ago • 5 comments

As I understand, the Task.upload_artifact function in ClearML is limited by what Python's pickle.dump can do. As of now pickle.dump cannot deal with objects that contain lambda functions.

Is there a way to upload these objects without resorting to storing the object on disk with e.g. dill and then uploading the file?

schiegl avatar Jun 02 '22 10:06 schiegl

The default fallback is pickle, because dill (which is a great package) is not a built in package. Relying on "dill" for serialization means we will always need to add it to the requirements. You can of course manually dill the object and use it. We did add the ability to add custom serialization function, maybe we should extend it to support dill

DavidNativ avatar Jun 02 '22 12:06 DavidNativ

We did add the ability to add custom serialization function, maybe we should extend it to support dill

There is no way to specify the custom serialization as a user of ClearML e.g. when instantiating a new task?

schiegl avatar Jun 02 '22 13:06 schiegl

Hi @schiegl! Thank you for opening this issue! At the moment, we don't support custom serialization, but it is a feature we would like to implement. There are 2 ways we could go about this (maybe implement both):

  1. Set default custom serialization/deserialization functions that would replace the pickle functions at task level. Something like: task.set_default_serialization(function) and task.set_default_deserialization(function)
  2. Set the serialization functions at artifact upload/download level: task.upload_artifact(name, artifact_object, serialization=function) and task.artifacts[name].get(deserialization=function)

What do you think?

In my case both would work fine. If you do it at the task level there might be some kind of matching (e.g. with is instance) necessary on the users side.

schiegl avatar Jun 03 '22 08:06 schiegl

Hi @schiegl FYI I am creating a feature request. Thanks for your contribution

DavidNativ avatar Jun 09 '22 07:06 DavidNativ

@DavidNativ Hello, I'm also interested in this feature. What's the status of it?

gbartyzel avatar Oct 20 '22 10:10 gbartyzel

Hi @schiegl and @gbartyzel ,

My apologies for not answering earlier, install the latest clearml version and in the upload_artifact() method you'll find a possibility to specify the serialization_function

erezalg avatar Oct 20 '22 12:10 erezalg

@erezalg Well, but I'm currently playing with pipelines and I want to pass Pytorch model from one step to another. It's not a pure Pytorch model, it's customized by our framework. Is this option also available in the pipeline decorator?

gbartyzel avatar Oct 20 '22 12:10 gbartyzel

@gbartyzel, Actually no, we somehow missed that :) I'll make sure it's out in the next version that's supposed to be released in a week or two.

erezalg avatar Oct 20 '22 12:10 erezalg

Oh that's great! Maybe it would be better to consider switching from pickle to dill?

gbartyzel avatar Oct 20 '22 12:10 gbartyzel

@gbartyzel you mean to replace the default serializer from pickle to dill? Or specifically in an experiment \ pipeline?

erezalg avatar Oct 20 '22 12:10 erezalg

@erezalg I mean by replacing the default pickle serializer. As I found, dill is more advanced then default pickle (lambda serializing etc.). But well, it's just a suggestion ;)

gbartyzel avatar Oct 20 '22 12:10 gbartyzel

@gbartyzel We thought the same thing, but as much as we love dill (and we do :) ), it's yet another dependency that users might not use (As far as we know, it's less commonly used than pickle) and adding dependencies that our users might not already have is something we try to avoid as we found out it's not well accepted.

Anyway, we've tested this with dill and it works, as soon as we can get you a nice interface to use on pipelines

erezalg avatar Oct 20 '22 13:10 erezalg

@erezalg Ok, a solid argument. So it would be awesome to choose manually the serializer engine.

gbartyzel avatar Oct 20 '22 13:10 gbartyzel

@gbartyzel That is the plan :) Will update here once it's out!

erezalg avatar Nov 07 '22 11:11 erezalg

Hey @schiegl and @gbartyzel! v1.10.0 is now out, supporting custom artifact serialization for pipelines

pollfly avatar Apr 04 '23 11:04 pollfly

@schiegl, @gbartyzel, closing this, please reopen if required.

jkhenning avatar Aug 06 '23 07:08 jkhenning