FunASR
FunASR copied to clipboard
如何传入定制热词训练seaco-paraformer
Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
❓ Questions and Help
Before asking:
- search the issues.
- search the docs.
What is your question?
在使用seaco paraformer微调时,如何加入自己的热词进行训练呢。在finetune脚本和train_ds.py下都没有看到怎么传入热词.
Code
torchrun $DISTRIBUTED_ARGS
../../../funasr/bin/train_ds.py
++model="${model_name_or_model_dir}"
++train_data_set_list="${train_data}"
++valid_data_set_list="${val_data}"
++dataset="AudioDatasetHotword"
++dataset_conf.index_ds="IndexDSJsonl"
++dataset_conf.data_split_num=1
++dataset_conf.batch_sampler="BatchSampler"
++dataset_conf.batch_size=6000
++dataset_conf.sort_size=1024
++dataset_conf.batch_type="token"
++dataset_conf.num_workers=4
++train_conf.max_epoch=50
++train_conf.log_interval=1
++train_conf.resume=true
++train_conf.validate_interval=2000
++train_conf.save_checkpoint_interval=2000
++train_conf.avg_keep_nbest_models_type='loss'
++train_conf.keep_nbest_models=20
++train_conf.avg_nbest_model=10
++train_conf.use_deepspeed=false
++train_conf.deepspeed_config=${deepspeed_config}
++train_conf.find_unused_parameters=true
++optim_conf.lr=0.0002
++output_dir="${output_dir}" &> ${log_file}
What have you tried?
What's your environment?
- OS (e.g., Linux):Linux
- FunASR Version (e.g., 1.0.0):1.2.6
- ModelScope Version (e.g., 1.11.0):1.23.2
- PyTorch Version (e.g., 2.0.0):2.4.0
- How you installed funasr (
pip, source):source - Python version:3.9
- GPU (e.g., V100M32):4090
- CUDA/cuDNN version (e.g., cuda11.7):12.1
- Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1):no
- Any other relevant information: