awesome-bert-japanese icon indicating copy to clipboard operation
awesome-bert-japanese copied to clipboard

Raw text segmentation or puntuation

Open marlon-br opened this issue 2 years ago • 0 comments

Hello,

Thank you for collecting links to the bert based models for Japanese

Just wanted to ask if you know any models or investigations regarding raw text (after automatic speech recognition the text is not splitted at all, just characters one by one) segmentation? Something simple like splitting text on sentences or more complicated like adding punctuation to the text. For example, nvidia provides models for punctuation based on bert and distilbert: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/punctuation_and_capitalization.html

That would be great if there is something for raw text split for Japanese language

marlon-br avatar Jul 14 '21 10:07 marlon-br