Erwin
Erwin
whisper的词典是如何生成的? RK提供了英文token词典:https://github.com/airockchip/rknn_model_zoo/blob/main/examples/whisper/model/vocab_en.txt 还有中文的,请问其他语种的词典该如何生成?目前转出来的onnx是否支持其他语种的识别?(修改task_code的ID及对应的token词典,是否就能进行其他语种的识别?) 通过如下代码,可以导出英文的voca_en.txt ``` tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-small", language="en", task="transcribe") tokenizer.set_prefix_tokens(language="en", task="transcribe", predict_timestamps=False) tokenizer.save_pretrained("/data/workspace/github/whisper/ja") ``` 但是改成日文后,保存下来的还是英文的vocab词汇表,请问各位专家,如何保存日文的vocab词汇表? ``` tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-small", language="japanese", task="transcribe") tokenizer.set_prefix_tokens(language="japanese", task="transcribe", predict_timestamps=False) tokenizer.save_pretrained("/data/workspace/github/whisper/ja") ```