awesome-bert-japanese
awesome-bert-japanese copied to clipboard
Raw text segmentation or puntuation
Hello,
Thank you for collecting links to the bert based models for Japanese
Just wanted to ask if you know any models or investigations regarding raw text (after automatic speech recognition the text is not splitted at all, just characters one by one) segmentation? Something simple like splitting text on sentences or more complicated like adding punctuation to the text. For example, nvidia provides models for punctuation based on bert and distilbert: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/punctuation_and_capitalization.html
That would be great if there is something for raw text split for Japanese language