omikuji icon indicating copy to clipboard operation
omikuji copied to clipboard

Predefined tree architecture

Open rabitwhte opened this issue 4 years ago • 6 comments

Hey @tomtung!

Do you think it would be possible to predefine tree architecture for parabel? It could be useful when labels are hierarchical by definition. Do you have any hints for implementation? Like where to start?

Regards, rabitwhte

rabitwhte avatar Jul 02 '20 18:07 rabitwhte

Or maybe, if I know the tree architecture, then the main job of parabel is done? And all I should do is to train 1 vs all classifiers in each node? I will be grateful for some insights.

rabitwhte avatar Jul 02 '20 18:07 rabitwhte

Unfortunately using pre-defined tree structure is not yet supported. I won't have time to implement it soon, but contributions are welcome :)

Tree-building is half of parabel's job, and you can certainly manually train all the 1-vs-all classifiers yourself based on the given tree. If you want to utilize omikuji for serving, you can try figure out the cbor format with which omikuji saves trees, and save your trees in that format (may be a bit tedious but should be straightforward).

tomtung avatar Jul 02 '20 19:07 tomtung

Hmm, but cbor files keep not only tree architecture information, but also the classifier themselves, right? Creating trees is an intermediate step in parabel, but can they be saved /edited after created?

What kind of classifiers is your implementation using?

rabitwhte avatar Jul 03 '20 09:07 rabitwhte

Hmm, but cbor files keep not only tree architecture information, but also the classifier themselves, right?

That's correct.

Creating trees is an intermediate step in parabel, but can they be saved /edited after created?

For better parallelization, tree creation and classifier training actually happen in tandem, which is why it's nontrivial to use a pre-defined tree structure without some major refactoring :(

What kind of classifiers is your implementation using?

By default it uses coordinate descent code ported from liblinear to solve a L2-regularized linear SVM:

https://github.com/tomtung/omikuji/blob/22f4c139036a1aae61ff548a6f375ae026849bd1/src/model/liblinear.rs#L189

tomtung avatar Jul 04 '20 20:07 tomtung

Hey @tomtung,

another question on that. When doing the prediction, how does the algorithm choose which way to go on the tree node? This is multilabel algorithm, so it cannot just go with the 'highest probability path', it does explore other branches as well right? How does it decide which branch to go? Or maybe it just shows leafs from the very last branch?

Regards, rabitwhte

rabitwhte avatar Jul 20 '20 06:07 rabitwhte

It does beam search, as described in the paper as well.

tomtung avatar Jul 20 '20 07:07 tomtung