FunASR
FunASR copied to clipboard
I can't find data prepared part recipe codes about sond diarization model
Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
❓ Questions and Help
This is issue is same as https://github.com/modelscope/FunASR/issues/1916 I have follow your @LauraGPT sugestion and refer branch-0.8.8, the details are as follows:
I refer https://github.com/modelscope/FunASR/blob/v0.8.8/egs/alimeeting/diarization/sond/run.sh, I can use the pretrained sond model and preprocessed alimeeting test set to get the DER claimed in the paper(https://arxiv.org/pdf/2211.10243) of 4.12% on alimeeting test set. however, How to get preprocessed test data? The relevant code is missing in the current folder. I refer https://github.com/modelscope/FunASR/blob/v0.8.8/egs/alimeeting/modular_sa_asr/run_diar.sh, For obtaining speaker profile, It uses Vbx to get the first diarization, then performs rttm2segment and remove overlap operations, obtains xvector on this basis, and finally performs resegment_data operation. The segments and wav.scp contained in the data_source_dir in this step are not clearly stated. I made two assumptions here. Assumption 1, it comes directly from the segments and wav.scp after using pretrain vad, and finally obtains DER 10.69% on alimeeting eval. Assumption 2, it comes directly from oracle segments (I use https://github.com/modelscope/FunASR/blob/v0.8.8/egs/alimeeting/sa_asr/run.sh --stage1 ---stop-stage 1 to obtain data/org/Eval_Ali_far/{segments,wav.scp} as data_source_dir, and finally obtain DER 10.14% on alimeeting eval) , However, It still does the same operation mentioned in the paper to get the speaker profile by using blstm spectral cluster, and the DER here is much different from what is claimed in the paper.
Could you help me and solve and improve the result?
Before asking:
- search the issues.
- search the docs.
What is your question?
Code
What have you tried?
What's your environment?
- OS (e.g., Linux):
- FunASR Version (e.g., 1.0.0):
- ModelScope Version (e.g., 1.11.0):
- PyTorch Version (e.g., 2.0.0):
- How you installed funasr (
pip, source): - Python version:
- GPU (e.g., V100M32)
- CUDA/cuDNN version (e.g., cuda11.7):
- Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
- Any other relevant information:
@ZhihaoDU Please help check it?
@ZhihaoDU , Any comments?
You can obtain standard speaker diarization files, such as wav.scp, rttm, with the official recipe of Alimeeting competition. Then, you can refer the TOLD/soap recipe in https://github.com/modelscope/FunASR/blob/v0.8.8/egs/callhome/TOLD/soap/run.sh. Although this recipe is for callhome, but the data preparing can be shared between SOAP and SOND with the standard speaker diarization files. Note that, while the oracle VAD information is used in Alimeeting at the inference, callhome results are based on the VAD model outputs.
@ZhihaoDU ,Thanks for your reply, I follow your sugestion and prepared this script for alimeeting eval dataset using release sond model and xvector model. However ,final DER is very bad. I don't know where I did wrong, please point it out. I'll paste the shell code below.
step1: prepared wav.scp and rttm files
#!/bin/bash
. ./path.sh || exit 1; # its contains kaldi and funasr envirenment.
stage=0
stop_stage=1000
. utils/parse_options.sh || exit 1;
## because alimeeting has 8 channels, we are now focusing on the single-channel diarization system.
## so we will extract mono channel audio wavform.
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ];then
input_dir=/mntcephfs/lab_data/maduo/datasets/alimeeting/Eval_Ali/Eval_Ali_far/audio_dir
output_dir=data/alimeeting_mono/Eval/audio_dir
python local/get_alimeeting_mono_audio.py \
$input_dir $output_dir
fi
# refer from https://github.com/yufan-aslp/AliMeeting/blob/main/speaker/run.sh
if [ $stage -le 1 ] && [ ${stop_stage} -ge 1 ];then
echo "prepared alimeeting eval set ref rttm file"
textgrid_dir=/mntcephfs/lab_data/maduo/datasets/alimeeting/Eval_Ali/Eval_Ali_far/textgrid_dir
audio_dir=data/alimeeting_mono/Eval/audio_dir/
dest_dir=data/alimeeting_mono/Eval/v9
work_dir=$dest_dir/.work
mkdir -p $work_dir
find -L $audio_dir -name "*.wav" > $work_dir/wavlist
sort $work_dir/wavlist > $work_dir/tmp
cp $work_dir/tmp $work_dir/wavlist
awk -F '/' '{print $NF}' $work_dir/wavlist | awk -F '.' '{print $1}' > $work_dir/uttid
find -L $textgrid_dir -iname "*.TextGrid" > $work_dir/textgrid.flist
sort $work_dir/textgrid.flist > $work_dir/tmp
cp $work_dir/tmp $work_dir/textgrid.flist
paste $work_dir/uttid $work_dir/textgrid.flist > $work_dir/uttid_textgrid.flist
paste $work_dir/uttid $work_dir/wavlist > $dest_dir/wav.scp
paste $work_dir/uttid $work_dir/uttid > $work_dir/utt2spk
cp $work_dir/utt2spk $work_dir/spk2utt
cp $work_dir/uttid $work_dir/text
while read line;do
text_grid=`echo $line | awk '{print $1}'`
text_grid_path=`echo $line | awk '{print $2}'`
echo "text_grid: $text_grid"
echo "text_grid_path: ${text_grid_path}"
python3 local/make_textgrid_rttm.py\
--input_textgrid_file $text_grid_path \
--uttid $text_grid \
--output_rttm_file $work_dir/${text_grid}.rttm
done < $work_dir/uttid_textgrid.flist
#dest_dir=data/alimeeting_mono/Eval
cat $work_dir/*.rttm > $dest_dir/alimeeting_eval.rttm
cat $dest_dir/alimeeting_eval.rttm > $dest_dir/ref.rttm
mv $work_dir/{spk2utt,utt2spk,text} $dest_dir/
fi
step2: get non-overlap segments via ref.rttm file
datadir=data/alimeeting_mono
version=v9
dumpdir=dump
expdir=exp
train_cmd=utils/run.pl
sr=16000
nj=8
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
echo "Stage 2: Extract non-overlap segments from alimeeting eval dataset"
for dset in Eval ; do
echo "Stage 2: Extracting non-overlap segments for "${dset}
mkdir -p ${dumpdir}/${dset}/nonoverlap_0s
python3 -Wignore script/extract_nonoverlap_segments.py \
${datadir}/${dset}/${version}/wav.scp ${datadir}/${dset}/${version}/ref.rttm ${dumpdir}/${dset}/${version}/nonoverlap_0s \
--min_dur 0.1 --max_spk_num 4 --sr ${sr} --no_pbar --nj ${nj}
mkdir -p ${datadir}/${dset}/${version}/nonoverlap_0s
find ${dumpdir}/${dset}/${version}/nonoverlap_0s/ -iname "*.wav" | sort | awk -F'[/.]' '{print $(NF-1),$0}' > ${datadir}/${dset}/${version}/nonoverlap_0s/wav.scp
awk -F'[/.]' '{print $(NF-1),$(NF-2)}' ${datadir}/${dset}/${version}/nonoverlap_0s/wav.scp > ${datadir}/${dset}/${version}/nonoverlap_0s/utt2spk
echo "Done."
done
fi
step3: get 80-dimensions fbank feature.
if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
echo "Stage 5: Generate fbank features"
#home_path=`pwd`
#cd ${kaldi_root}/egs/callhome_diarization/v2 || exit
#. ./cmd.sh
. ./path.sh
for dset in Eval; do
steps/make_fbank.sh --write-utt2num-frames true --fbank-config conf/fbank_16k.conf --nj ${nj} --cmd "$train_cmd" \
${datadir}/${dset}/${version} ${expdir}/make_fbank/${dset}/${version} ${dumpdir}/${dset}/${version}fbank
utils/fix_data_dir.sh ${datadir}/${dset}/${version}
done
for dset in Eval/${version}/nonoverlap_0s; do
steps/make_fbank.sh --write-utt2num-frames true --fbank-config conf/fbank_16k.conf --nj ${nj} --cmd "$train_cmd" \
${datadir}/${dset} ${expdir}/make_fbank/${dset} ${dumpdir}/${dset}/fbank
utils/fix_data_dir.sh ${datadir}/${dset}
done
#cd ${home_path} || exit
fi
step4: get xvector speaker file
if [ $stage -le 6 ] && [ ${stop_stage} -ge 6 ]; then
echo "download xvector speaker pre-training model"
#expdir=exp
git lfs install
git clone https://www.modelscope.cn/iic/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch.git
mv speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch ${expdir}/
fi
infer_cmd=utils/run.pl
inference_nj=4
# number of jobs for inference
# for gpu decoding, inference_nj=ngpu*njob; for cpu decoding, inference_nj=njob
njob=4
ngpu=1
inference_nj=$[${ngpu}*${njob}]
_ngpu=1
gpuid_list="0"
if [ $stage -le 7 ] && [ ${stop_stage} -ge 7 ]; then
sv_exp_dir=$expdir/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch
sed "s/input_size: null/input_size: 80/g" ${sv_exp_dir}/sv.yaml > ${sv_exp_dir}/sv_fbank.yaml
for dset in Eval/${version}/nonoverlap_0s; do
key_file=${datadir}/${dset}/feats.scp
num_scp_file="$(<${key_file} wc -l)"
_nj=$([ $inference_nj -le $num_scp_file ] && echo "$inference_nj" || echo "$num_scp_file")
_logdir=${dumpdir}/${dset}/xvecs
mkdir -p ${_logdir}
split_scps=
for n in $(seq "${_nj}"); do
split_scps+=" ${_logdir}/keys.${n}.scp"
done
# shellcheck disable=SC2086
utils/split_scp.pl "${key_file}" ${split_scps}
${infer_cmd} --gpu "${_ngpu}" --max-jobs-run "${_nj}" JOB=1:"${_nj}" "${_logdir}"/sv_inference.JOB.log \
python3 -m funasr.bin.sv_inference_launch \
--batch_size 1 \
--njob ${njob} \
--ngpu "${_ngpu}" \
--gpuid_list ${gpuid_list} \
--data_path_and_name_and_type "${key_file},speech,kaldi_ark" \
--key_file "${_logdir}"/keys.JOB.scp \
--sv_train_config ${sv_exp_dir}/sv_fbank.yaml \
--sv_model_file ${sv_exp_dir}/sv.pth \
--output_dir "${_logdir}"/output.JOB
cat ${_logdir}/output.*/xvector.scp | sort > ${datadir}/${dset}/utt2xvec
done
fi
if [ ${stage} -le 8 ] && [ ${stop_stage} -ge 8 ]; then
echo "Stage 8: Generate label files."
for dset in Eval/${version}; do
echo "Stage 8: Generate labels for ${dset}."
python3 -Wignore script/calc_real_meeting_frame_labels.py \
${datadir}/${dset} ${dumpdir}/${dset}/labels \
--n_spk 4 --frame_shift 0.01 --nj $nj --sr $sr
find `pwd`/${dumpdir}/${dset}/labels/ -iname "*.lbl.mat" | awk -F'[/.]' '{print $(NF-2),$0}' | sort > ${datadir}/${dset}/labels.scp
done
fi
if [ ${stage} -le 9 ] && [ ${stop_stage} -ge 9 ];then
# dump alimeeting eval data in test mode.
data_dir=${datadir}/Eval/${version}/files_for_dump
mkdir ${data_dir}
# filter out zero duration segments
LC_ALL=C awk '{if ($5 > 0){print $0}}' ${datadir}/Eval/${version}/ref.rttm > ${data_dir}/ref.rttm
cp ${datadir}/Eval/${version}/{feats.scp,labels.scp} ${data_dir}/
cp ${datadir}/Eval/${version}/nonoverlap_0s/{utt2spk,utt2xvec,utt2num_frames} ${data_dir}/
#echo "Stage 8: start to dump for alimeeting."
echo "Stage 9: start to dump for alimeeting."
python3 -Wignore script/dump_meeting_chunks.py --dir ${data_dir} \
--out ${dumpdir}/Eval/${version}/dumped_files/data --n_spk 16 --no_pbar --sr $sr --mode test \
--chunk_size 1600 --chunk_shift 400 --add_mid_to_speaker true
mkdir -p ${datadir}/Eval/${version}/dumped_files
cat ${dumpdir}/Eval/${version}/dumped_files/data_parts*_feat.scp | sort > ${datadir}/Eval/${version}/dumped_files/feats.scp
cat ${dumpdir}/Eval/${version}/dumped_files/data_parts*_xvec.scp | sort > ${datadir}/Eval/${version}/dumped_files/profile.scp
cat ${dumpdir}/Eval/${version}/dumped_files/data_parts*_label.scp | sort > ${datadir}/Eval/${version}/dumped_files/label.scp
mkdir -p ${expdir}/alimeeting_eval_states
awk '{print $1,"1600"}' ${datadir}/Eval/${version}/dumped_files/feats.scp | shuf > ${expdir}/alimeeting_eval_states/speech_shape
python3 -Wignore script/convert_rttm_to_seg_file.py --rttm_scp ${data_dir}/ref.rttm --seg_file ${data_dir}/org_vad.txt
fi
# evaluate for pretrained model
if [ ${stage} -le 11 ] && [ ${stop_stage} -ge 11 ]; then
echo "stage 11: evaluation for phase-1 model."
test_sets=Eval/${version}
# inference related
inference_model=sond.pb # offical release sond model
#inference_config=exp/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch/sond_fbank.yaml
model_dir=speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch
for dset in ${test_sets}; do
echo "Processing for $dset"
exp_model_dir=${expdir}/${model_dir}
#_inference_tag="$(basename "${inference_config}" .yaml)${inference_tag}"
#_dir="${exp_model_dir}/${_inference_tag}/${inference_model}/${dset}"
_dir=${exp_model_dir}/${dset}
_logdir="${_dir}/logdir"
if [ -d ${_dir} ]; then
echo "WARNING: ${_dir} is already exists."
fi
mkdir -p "${_logdir}"
_data="${datadir}/${dset}/dumped_files"
key_file=${_data}/feats.scp
num_scp_file="$(<${key_file} wc -l)"
_nj=$([ $inference_nj -le $num_scp_file ] && echo "$inference_nj" || echo "$num_scp_file")
split_scps=
for n in $(seq "${_nj}"); do
split_scps+=" ${_logdir}/keys.${n}.scp"
done
_opt=
if [ ! -z "${inference_config}" ]; then
_opt="--config ${inference_config}"
fi
# shellcheck disable=SC2086
utils/split_scp.pl "${key_file}" ${split_scps}
echo "Inference log can be found at ${_logdir}/inference.*.log"
${infer_cmd} --gpu "${_ngpu}" --max-jobs-run "${_nj}" JOB=1:"${_nj}" "${_logdir}"/inference.JOB.log \
python3 -m funasr.bin.diar_inference_launch \
--batch_size 1 \
--ngpu "${_ngpu}" \
--njob ${njob} \
--gpuid_list ${gpuid_list} \
--data_path_and_name_and_type "${_data}/feats.scp,speech,kaldi_ark" \
--data_path_and_name_and_type "${_data}/profile.scp,profile,kaldi_ark" \
--key_file "${_logdir}"/keys.JOB.scp \
--diar_train_config "${exp_model_dir}"/sond_fbank.yaml \
--diar_model_file "${exp_model_dir}"/"${inference_model}" \
--output_dir "${_logdir}"/output.JOB \
--mode sond
#${_opt}
done
fi
if [ ${stage} -le 12 ] && [ ${stop_stage} -ge 12 ]; then
echo "stage 12: Scoring phase-1 models"
if [ ! -e dscore ]; then
git clone https://github.com/nryant/dscore.git
# add intervaltree to setup.py
fi
fi
if [ ${stage} -le 13 ] && [ ${stop_stage} -ge 13 ]; then
test_sets=Eval/${version}
model_dir=speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch
for dset in ${test_sets}; do
echo "stage 13: Scoring for ${dset}"
diar_exp=${expdir}/${model_dir}
_data="${datadir}/${dset}"
#_inference_tag="$(basename "${inference_config}" .yaml)${inference_tag}"
#_dir="${diar_exp}/${_inference_tag}/${inference_model}/${dset}"
_dir=${diar_exp}/${dset}
_logdir="${_dir}/logdir"
cat ${_logdir}/*/labels.txt | sort > ${_dir}/labels.txt
python3 script/convert_label_to_rttm.py \
${_dir}/labels.txt \
${datadir}/${dset}/files_for_dump/org_vad.txt \
${_dir}/sys.rttm \
--ignore_len 10 \
--no_pbar \
--smooth_size 83 \
--vote_prob 0.5 \
--n_spk 16
# echo ${cmd}
#eval ${cmd}
ref=${datadir}/${dset}/files_for_dump/ref.rttm
sys=${_dir}/sys.rttm.ref_vad
#OVAD_DER=$(python3 -Wignore dscore/score.py -r $ref -s $sys --collar 0.25 2>&1 | grep OVERALL | awk '{print $4}')
python3 -Wignore dscore/score.py -r $ref -s $sys --collar 0.25
ref=${datadir}/${dset}/files_for_dump/ref.rttm
sys=${_dir}/sys.rttm.sys_vad
#SysVAD_DER=$(python3 -Wignore dscore/score.py -r $ref -s $sys --collar 0.25 2>&1 | grep OVERALL | awk '{print $4}')
python3 -Wignore dscore/score.py -r $ref -s $sys --collar 0.25
#echo -e "${inference_model} ${OVAD_DER} ${SysVAD_DER}" | tee -a ${_dir}/results.txt
done
fi
@ZhihaoDU , Any comments?
@ZhihaoDU ,No further hints?