skops
skops copied to clipboard
Have preprocessors in repository
We need to have preprocessor objects serialized and their code in a separate script in the repository user wants to persist the model in, preprocessor objects such as ColumnTransformer or pd.pipe(), especially pd.pipe() as it includes custom code with method chaining.
I don't quite understand the issue. ColumnTransformer would just be part of the model Pipeline and as such of the final model object, no?
@BenjaminBossan I feel like there might be a case (not sure though) where you have a ColumnTransformer and following that, a pd.pipe() and then you pass the data to the model to infer, or custom transformers in which it might be good to have the code.
The ColumnTransformer shouldn't be a separate object or a separate code, it should be in the pipeline, so we don't have to do anything about that. But the rest of the preprocessing would be nice to have. The issue is that the hub doesn't do normal git workflow (you can't have PRs from your existing branches for instance. That's why I'm not sure how to deal with it.
so do we still think we need this?
I don't think so! 😅