skops
skops copied to clipboard
Have utils to enable easier inference
The features we added to enable inference widget (e.g. having example input at config) can be repurposed to let the users understand the model and infer through API and easily build demos to the models. I don't know if it's ambiguous but what I have in mind are following (I really try hard not to make this a brain dump 🧠🥟) :
- (@adrinjalali's idea) having utils that will make the inference pandas-in, pandas-out would be very handy for data scientists,
- For people who'd like to build demos easily: dataframe component at Gradio expects dataframe columns to be in certain size. Because we already provide example input and task name inside config, it can be used to create demos with few of lines of code if we have a util for this. Gradio has a similar integration with transformers that leverage pipeline and task in the back, that let you build a demo with one line of code.
- Our examples and documentation are minimal when it comes to inference. We should add inference section leveraging the example input in config (inferring programmatically) (rather than just using it for inference widget)
Maybe you can add your ideas to this issue. Safe persistence is a bigger issue already discussed so this is out of scope for this issue.
Tasks to be done:
- [ ] Gradio integration (will get in touch with them)
- [ ] Add inference to user guide for hub_utils
Let me know if there's any other task to be done.
These are basically 3 different issues, and I agree we should address all of them.
Not sure if it fits here, but I was wondering about predicting labels instead of ints for classification. AFAIK, you can already fit a classifier with labels (e.g. y=["setosa", "versicolor", "virginica"]) but that's often not desired. So if we provide the target encoding as an additional feature (basically the label_encoder.classes_), we can have the model output ints and then have the widget convert them to labels.
I think I rather show the model output there. If the user wants strings as outputs, they can train it themselves.
I think I rather show the model output there. If the user wants strings as outputs, they can train it themselves.
The default could stay the same but if a user explicitly provides the label encoding, why not show both? Let's say the user has already trained the model on encoded labels (which I would assume is typical), they don't need to retrain just to show the original labels.
they could add that to their pipeline as a last step kinda thing, or wrap their already trained model. I don't want to quickly get into a feature creep kinda situation. If it can be solved easily by the user only using sklearn, I don't think skops should do anything about it. But I'm more than happy to have an example in our docs to explain to users how they can do it.
they could add that to their pipeline as a last step kinda thing
If that's easily possible, we should show how. I would guess that showing the label instead of an int would be more commonly desired.
@merveenoyan I'm not sure what we can do about the gradio part of this issue from our side.
@BenjaminBossan @adrinjalali added two tasks above, let me know if there's anything else we should do.
We kinda decided not to have a full blown example for inference, since it takes quite a while to run. We should instead document it well in the user guide (in a kind of way that making the docs wouldn't run that code)
Related:
- https://github.com/gradio-app/gradio/issues/2090
- https://huggingface.co/spaces/scikit-learn/tabular-playground
Closing as it's complete now.