jumanpp icon indicating copy to clipboard operation
jumanpp copied to clipboard

Training API

Open cin-hy opened this issue 6 years ago • 1 comments

I would like to train the model on a different dataset. Does it have any training API that I could call?

cin-hy avatar Apr 08 '19 03:04 cin-hy

You need not simply a dataset, but a segmentation dictionary and annotated corpus. We need to release our segmentation dictionary, but there is almost no documentation on how to use it.

For training, if you plan to use Jumandic for segmentation, you need only corpus and can use <build_dir>/src/jumandic/jpp_train_jumandic binary for training a model. This process needs to be documented.

If not, please follow https://github.com/eiennohito/jumanpp-t9 on how to use Juman++ for your dataset/segmentation standard.

eiennohito avatar Apr 14 '19 06:04 eiennohito