mlem icon indicating copy to clipboard operation
mlem copied to clipboard

Pass kwarg to save

Open cwerner opened this issue 3 years ago • 3 comments

I usually save my pandas dataframes with compression (they are also easily read by read_csv() without a special argument):

df.to_csv("my_data.csv.gz", compression="gzip")

Is there a way to pass the compression kwarg to save()?

cwerner avatar May 01 '22 19:05 cwerner

Posting @daavoo feedback re the similar functionality for models:

  • What is the output format used for the models (only see model binary)? How can I choose between different formats for the same framework?
  • How do I add my own model format (i.e I want to convert to ONNX my keras model)?

aguschin avatar Aug 17 '22 08:08 aguschin

I also have a similar need for saving my model. Here is an example of how it looks:

import joblib
from sklearn.base import BaseEstimator, ClassifierMixin, TransformerMixin
from . import load_dataset

X, y  = load_dataset()

...

pipeline = Pipeline(
    [
        ("tf", CustomTransformer(...)),
        ("clf", CustomClassifier(...)),
    ]
)
pipeline.fit(X, y)

joblib.dump(pipeline, "./model/model.pkl", compress=3)

It would be nice to have something like this but with save().

TheFirstMe avatar Aug 30 '22 09:08 TheFirstMe

@mike0sv, is this possible at all? I mean, all extensions have their custom options, so we need to add some mechanics to extensions to find these options and consume them from the top-level mlem.api.save call.

Btw, looks like it would be good to support this similarly to what you did for mlem deploy run kubernetes -h. I mean, how user can find out what's supported, if we just take kwargs and try to apply them somewhere in MLEM internals? Would be good to expose this somehow so user could see what's possible to control.

aguschin avatar Nov 09 '22 07:11 aguschin