FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

使用speaker diarization MISS错误率很高,请问是vad模块效果不好吗?还有结合视频的DER结果效果比单音频的还要差,请问这可以微调嘛?

Open Coconut059 opened this issue 1 year ago • 1 comments
trafficstars

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

Code

What have you tried?

What's your environment?

  • OS (e.g., Linux):
  • FunASR Version (e.g., 1.0.0):
  • ModelScope Version (e.g., 1.11.0):
  • PyTorch Version (e.g., 2.0.0):
  • How you installed funasr (pip, source):
  • Python version:
  • GPU (e.g., V100M32)
  • CUDA/cuDNN version (e.g., cuda11.7):
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
  • Any other relevant information:

Coconut059 avatar Apr 11 '24 10:04 Coconut059

在MISP2022数据集上使用speaker diarization发现仅使用音频MISS约为23%,DER34%;使用音频加视频DER大约43%,请问vad模块可以微调吗?以及结合视频的clustering可以微调嘛

Coconut059 avatar Apr 11 '24 10:04 Coconut059