pyctcdecode icon indicating copy to clipboard operation
pyctcdecode copied to clipboard

How Many HotWords

Open finardi opened this issue 3 years ago • 1 comments

Hello! There is a golden interval for how many hotwords can I pass to the decode? 1000 is too much? I've fine-tuned a model in Portuguese language and I have a specific vocabulary with bank/finance context.

finardi avatar Apr 01 '22 19:04 finardi

have you tested your performance on a test set? I'm not sure exactly at what point the number of hotwords will start degrading performance but 1000 sounds like a lot. at some point it would probably be better just to retrain the language model with an upgraded vocabulary and keep hotwords for a small number of more targeted words

lopez86 avatar Apr 16 '22 16:04 lopez86