FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

如何传入定制热词训练seaco-paraformer

Open yuql-sea opened this issue 6 months ago • 0 comments

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

在使用seaco paraformer微调时,如何加入自己的热词进行训练呢。在finetune脚本和train_ds.py下都没有看到怎么传入热词.

Code

torchrun $DISTRIBUTED_ARGS
../../../funasr/bin/train_ds.py
++model="${model_name_or_model_dir}"
++train_data_set_list="${train_data}"
++valid_data_set_list="${val_data}"
++dataset="AudioDatasetHotword"
++dataset_conf.index_ds="IndexDSJsonl"
++dataset_conf.data_split_num=1
++dataset_conf.batch_sampler="BatchSampler"
++dataset_conf.batch_size=6000
++dataset_conf.sort_size=1024
++dataset_conf.batch_type="token"
++dataset_conf.num_workers=4
++train_conf.max_epoch=50
++train_conf.log_interval=1
++train_conf.resume=true
++train_conf.validate_interval=2000
++train_conf.save_checkpoint_interval=2000
++train_conf.avg_keep_nbest_models_type='loss'
++train_conf.keep_nbest_models=20
++train_conf.avg_nbest_model=10
++train_conf.use_deepspeed=false
++train_conf.deepspeed_config=${deepspeed_config}
++train_conf.find_unused_parameters=true
++optim_conf.lr=0.0002
++output_dir="${output_dir}" &> ${log_file}

What have you tried?

What's your environment?

  • OS (e.g., Linux):Linux
  • FunASR Version (e.g., 1.0.0):1.2.6
  • ModelScope Version (e.g., 1.11.0):1.23.2
  • PyTorch Version (e.g., 2.0.0):2.4.0
  • How you installed funasr (pip, source):source
  • Python version:3.9
  • GPU (e.g., V100M32):4090
  • CUDA/cuDNN version (e.g., cuda11.7):12.1
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1):no
  • Any other relevant information:

yuql-sea avatar Apr 21 '25 06:04 yuql-sea