FunASR Timestamps of words seems not correct in file transcription service

Timestamps of words seems not correct in file transcription service

Open electroniccc opened this issue 1 year ago • 1 comments

When I try to convert some audio files, I notice that the timestamps in the returned result don't look correct. For example, the total duration of the audio file is about 6 minutes, but the timestamp of the last word is about 600s. The file info: The result:

My enviroment: OS: Ubuntu22.04(WSL) Docker image:

a186f040b0a1   registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.3.0   "/bin/bash"   3 weeks ago   Up 30 seconds             0.0.0.0:10095->10095/tcp   funasr

Start command:

nohup bash run_server.sh \
  --certfile "" \
  --model_dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx \
  > log.out 2>&1 &

The sample file: https://drive.google.com/file/d/1YS3gWovNJIPDjN9gs-vBxoNx7dUvvO2l/view?usp=drive_link

Dec 11 '23 12:12 electroniccc

Currently, after passing through the ITN, timestamp misalignment occurs. The issue has been fixed and will be released in the next version.

Dec 12 '23 03:12 lyblsgo

try funasr:funasr-runtime-sdk-cpu-0.4.0

Jan 03 '24 08:01 lyblsgo

Tried funasr-runtime-sdk-cpu-0.4.2, the issue still exists.

Jan 28 '24 10:01 electroniccc

If the issue persists, please provide detailed steps to reproduce, as well as server and client logs.

Feb 05 '24 07:02 lyblsgo

Currently, after passing through the ITN, timestamp misalignment occurs. The issue has been fixed and will be released in the next version.

The issue mentioned by the poster doesn't happen when doing ASR via python inference though:

model = AutoModel(model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
                  model_revision="v2.0.4",
                  )
res = model.generate(input=wavf)

But I did find some misalignment cases where some sentences within a long audio mis-align by around 0.5s by using the above python code. Is it possible that the ITN issue you mentioned is responsible for this?

Feb 14 '24 13:02 wincing2

FunASR FunASR copied to clipboard

Timestamps of words seems not correct in file transcription service

FunASR
FunASR copied to clipboard