Pass kwarg to save
I usually save my pandas dataframes with compression (they are also easily read by read_csv() without a special argument):
df.to_csv("my_data.csv.gz", compression="gzip")
Is there a way to pass the compression kwarg to save()?
Posting @daavoo feedback re the similar functionality for models:
- What is the output format used for the models (only see model binary)? How can I choose between different formats for the same framework?
- How do I add my own model format (i.e I want to convert to ONNX my keras model)?
I also have a similar need for saving my model. Here is an example of how it looks:
import joblib
from sklearn.base import BaseEstimator, ClassifierMixin, TransformerMixin
from . import load_dataset
X, y = load_dataset()
...
pipeline = Pipeline(
[
("tf", CustomTransformer(...)),
("clf", CustomClassifier(...)),
]
)
pipeline.fit(X, y)
joblib.dump(pipeline, "./model/model.pkl", compress=3)
It would be nice to have something like this but with save().
@mike0sv, is this possible at all? I mean, all extensions have their custom options, so we need to add some mechanics to extensions to find these options and consume them from the top-level mlem.api.save call.
Btw, looks like it would be good to support this similarly to what you did for mlem deploy run kubernetes -h. I mean, how user can find out what's supported, if we just take kwargs and try to apply them somewhere in MLEM internals? Would be good to expose this somehow so user could see what's possible to control.