Incremental/Online training of Models
According to official documentation
"Once a model has been trained, it can be fed previously unseen Examples to produce Predictions of their Outputs"
I've only seen the possibility to add new Examples to Dataset using dataset.add(example); but not to Model.
Is this possible and I'm missing something?
There is a predict method on Model which produces predictions. Once a model has been trained you don't need to use a dataset to feed it examples, it will automatically prune out previously unseen features and convert the example into the correct format. See the docs here - https://tribuo.org/learn/4.0/javadoc/org/tribuo/Model.html#predict-org.tribuo.Example-
If you're asking about providing new training examples to retrain a model that already exists, then we're working on support for that but it's not ready yet.
If you're asking about providing new training examples to retrain a model that already exists, then we're working on support for that but it's not ready yet.
Yes. This is what I'm looking for. We want to be able to load the stored model from disk, add new trained examples and save it again to disk. Great library. Good luck with this feature and more serialization options.
Ok. Quick question: do you expect the feature or label spaces to change as you add more examples (i.e. will there be new labels or features in the new examples)? The first version of model incremental training is likely to not allow new features or labels as it's significantly simpler to do.
It would be great even without new features or labels. If I have to set a priority it would be
- no new features & labels
- no new features but with new labels
- with new features and new labels.
But that's just for my use case. It's not a priority. I can do a new training if I need to add a different feature/label.
Yeah, that's roughly where we expect most people to be. 2 & 3 are roughly equivalent in terms of implementation complexity, but adding new labels has statistical consequences because the new ones will be undertrained relative to the old ones, so we're thinking about ways to record it and potentially signal it to users.
Mind if I rename this issue? We'll use it to track the integration of incremental training support.
Sure. Go ahead and rename it.