HanLP icon indicating copy to clipboard operation
HanLP copied to clipboard

Fix evaluation and other minor issues to adapt to multi-label classification

Open callzhang opened this issue 5 years ago • 4 comments

callzhang avatar Dec 22 '20 06:12 callzhang

Tried to resolve the conflicts but it seems that you have changed the structure a lot. Any suggestion on how to merge?

callzhang avatar Jan 01 '21 08:01 callzhang

Yes, I refactored a lot. Basically, TensorFlow components are renamed to its original name with a TF shuffix. Sorry you have to do a line-by-line merge as I did for merging your commits to the new release. If you'd like to contribute, please rebase on dev.

By the way, since 2.1 officially dependents on the wonderful huggingface transformers. It would also be great if you want to use their BERT. Here is some reference codes: https://github.com/hankcs/HanLP/commit/1fe90f7040d591176712240285c8a514089ce73b

hankcs avatar Jan 01 '21 08:01 hankcs

I eventually wrote my own script on multi-label classification task. Basically using customized BCE with weights to deal with imbalanced classes and macro-F1 for metrics, as well as AdamW with amsGrad enabled. Aided with data augmentation, the performance achieved better. I will continue to use hanlp as an exploration tool and try to contribute when I can. Thanks!

callzhang avatar Jan 01 '21 08:01 callzhang

Sure, feel free to explore 2.1.

hankcs avatar Jan 01 '21 08:01 hankcs