skops icon indicating copy to clipboard operation
skops copied to clipboard

Have preprocessors in repository

Open merveenoyan opened this issue 3 years ago • 3 comments

We need to have preprocessor objects serialized and their code in a separate script in the repository user wants to persist the model in, preprocessor objects such as ColumnTransformer or pd.pipe(), especially pd.pipe() as it includes custom code with method chaining.

merveenoyan avatar Jul 11 '22 12:07 merveenoyan

I don't quite understand the issue. ColumnTransformer would just be part of the model Pipeline and as such of the final model object, no?

BenjaminBossan avatar Jul 11 '22 13:07 BenjaminBossan

@BenjaminBossan I feel like there might be a case (not sure though) where you have a ColumnTransformer and following that, a pd.pipe() and then you pass the data to the model to infer, or custom transformers in which it might be good to have the code.

merveenoyan avatar Jul 11 '22 13:07 merveenoyan

The ColumnTransformer shouldn't be a separate object or a separate code, it should be in the pipeline, so we don't have to do anything about that. But the rest of the preprocessing would be nice to have. The issue is that the hub doesn't do normal git workflow (you can't have PRs from your existing branches for instance. That's why I'm not sure how to deal with it.

adrinjalali avatar Jul 11 '22 13:07 adrinjalali

so do we still think we need this?

adrinjalali avatar Sep 06 '22 16:09 adrinjalali

I don't think so! 😅

merveenoyan avatar Sep 06 '22 16:09 merveenoyan