sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

[WIP] Add tokenize-hotwords option

Open pkufool opened this issue 1 year ago • 3 comments

This PR add tokenize-hotwords option to hotwods. Now we only support tokenizing hotwords for models trained on cjkchar, bpe and cjkchar+bpe. For those who want to use hotwords for other modeling units, they could set --tokenize-hotwords to false and pre-tokenize the hotwords before putting into the decoder.

pkufool avatar Jun 21 '24 08:06 pkufool

Hi @pkufool, is there any update on this PR? Thanks beforehand!

w11wo avatar Jul 26 '24 09:07 w11wo

Hi @pkufool, is there any update on this PR? Thanks beforehand!

Oh, I thought this was merged, will have a look.

pkufool avatar Jul 29 '24 05:07 pkufool

Hi @pkufool, sorry to keep tagging you. Are there any updates? 🙏

w11wo avatar Aug 26 '24 21:08 w11wo