FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

Can't reproduce CER on Common Voice yue using SenseVoice-Small

Open 123xxx12 opened this issue 1 year ago • 1 comments

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

Firstly, thanks for your great works on SenseVoice and open source a very useful tool of funasr.

However, I can't reproduce the reported CER in this paper.

The result I got on common voice 15 and 18 are below, the reported cer is 7.09%.

common voice 18 lan: zh source: commonvoice_yue_dev cnt: 3676 cer: 0.3275 lan: zh source: commonvoice_yue_test cnt: 3677 cer: 0.3381 common voice 15 lan: zh source: commonvoice_yue_v15_dev cnt: 2562 cer: 0.3279 lan: zh source: commonvoice_zh_HK_v15_dev cnt: 5593 cer: 0.3372 lan: zh source: commonvoice_yue_v15_test cnt: 2565 cer: 0.3388 lan: zh source: commonvoice_zh_HK_v15_test cnt: 5593 cer: 0.3361

Code

Model loading:

def load_sensevoice(device, sensevoice_path="FunAudioLLM/SenseVoiceSmall"):
    # model_dir = "FunAudioLLM/SenseVoiceSmall"
    model, kwargs = SenseVoiceSmall.from_pretrained(model=sensevoice_path, device=device, hub="hf")
    return model, kwargs

model, sensevoice_kwargs = load_sensevoice(device)

Forward pass:

for a_path in audio_path:
    res = model.inference(
        data_in=a_path,
        language="yue", # "zn", "en", "yue", "ja", "ko", "nospeech"
        use_itn=True,
        **sensevoice_kwargs,
    )

    text = rich_transcription_postprocess(res[0][0]["text"]) 
    outputs.append(text)

Metric Calculating: We first insert blank string between each character, and use the wer metric from evaluate for metric calculation (which is equivalent to cer).

from evaluate import load

wer = load("wer")
def compute_wer(refs, hyps, language):
    if language == "en":
        # capitialize hyps
        hyps = [h.upper() for h in hyps]
    if language == "zh":
        refs = [" ".join(ref) for ref in refs]
        hyps = [" ".join(hyp) for hyp in hyps]
    wer_score = wer.compute(predictions=hyps, references=refs)
    return wer_score

What have you tried?

What's your environment?

  • OS (e.g., Linux): Linux
  • FunASR Version (e.g., 1.0.0): 1.1.3
  • ModelScope Version (e.g., 1.11.0): 1.16.0
  • PyTorch Version (e.g., 2.0.0): 2.1.0
  • How you installed funasr (pip, source): pip
  • Python version: 3.10
  • GPU (e.g., V100M32): H100
  • CUDA/cuDNN version (e.g., cuda11.7):
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
  • Any other relevant information:

123xxx12 avatar Jul 24 '24 08:07 123xxx12