tribuo Incremental/Online training of Models

According to official documentation

"Once a model has been trained, it can be fed previously unseen Examples to produce Predictions of their Outputs"

I've only seen the possibility to add new Examples to Dataset using dataset.add(example); but not to Model.

Is this possible and I'm missing something?

Sep 24 '20 20:09 2beers

There is a predict method on Model which produces predictions. Once a model has been trained you don't need to use a dataset to feed it examples, it will automatically prune out previously unseen features and convert the example into the correct format. See the docs here - https://tribuo.org/learn/4.0/javadoc/org/tribuo/Model.html#predict-org.tribuo.Example-

If you're asking about providing new training examples to retrain a model that already exists, then we're working on support for that but it's not ready yet.

Sep 24 '20 20:09 Craigacp

If you're asking about providing new training examples to retrain a model that already exists, then we're working on support for that but it's not ready yet.

Yes. This is what I'm looking for. We want to be able to load the stored model from disk, add new trained examples and save it again to disk. Great library. Good luck with this feature and more serialization options.

Sep 24 '20 20:09 2beers

Ok. Quick question: do you expect the feature or label spaces to change as you add more examples (i.e. will there be new labels or features in the new examples)? The first version of model incremental training is likely to not allow new features or labels as it's significantly simpler to do.

Sep 24 '20 20:09 Craigacp

It would be great even without new features or labels. If I have to set a priority it would be

no new features & labels
no new features but with new labels
with new features and new labels.

But that's just for my use case. It's not a priority. I can do a new training if I need to add a different feature/label.

Sep 24 '20 20:09 2beers

Yeah, that's roughly where we expect most people to be. 2 & 3 are roughly equivalent in terms of implementation complexity, but adding new labels has statistical consequences because the new ones will be undertrained relative to the old ones, so we're thinking about ways to record it and potentially signal it to users.

Mind if I rename this issue? We'll use it to track the integration of incremental training support.

Sep 24 '20 21:09 Craigacp

Sure. Go ahead and rename it.

Sep 24 '20 23:09 2beers