FunASR
FunASR copied to clipboard
多级多卡训练paraformer模型报错
Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
🐛 Bug
执行:
torchrun --nnodes 2 --node_rank 0 --nproc_per_node ${gpu_num} --master_addr ******* --master_port 1234
../../../funasr/bin/train_ds.py
++model="${model_name_or_model_dir}"
++train_data_set_list="${train_data}"
++valid_data_set_list="${val_data}"
++dataset="AudioDataset"
++dataset_conf.index_ds="IndexDSJsonl"
++dataset_conf.data_split_num=1
++dataset_conf.batch_sampler="BatchSampler"
++dataset_conf.batch_size=6000
++dataset_conf.sort_size=1024
++dataset_conf.batch_type="token"
++dataset_conf.num_workers=12
++train_conf.max_epoch=200
++train_conf.log_interval=100
++train_conf.resume=true
++train_conf.validate_interval=5000
++train_conf.save_checkpoint_interval=5000
++train_conf.keep_nbest_models=50
++train_conf.avg_nbest_model=10
++train_conf.use_deepspeed=true
++train_conf.deepspeed_config=${deepspeed_config}
++optim_conf.lr=0.0008
++output_dir="${output_dir}" &> ${log_file}
模型在训练到几百step的时候报下面的错误:
liangxianchen-asr-2wh-pretrain1-m-0:174396:175238 [0] misc/socket.cc:538 NCCL WARN Net : Connection closed by remote peer liangxianchen-asr-2wh-pretrain1-w-0.liangxianchen-asr-2wh-pretrain1.prdsafe.svc.hbox2-zzzc2-prd.local<48836>
liangxianchen-asr-2wh-pretrain1-m-0:174396:175238 [0] NCCL INFO transport/net_socket.cc:493 -> 6
liangxianchen-asr-2wh-pretrain1-m-0:174396:175238 [0] NCCL INFO include/net.h:35 -> 6
liangxianchen-asr-2wh-pretrain1-m-0:174396:175238 [0] NCCL INFO transport/net.cc:1034 -> 6
liangxianchen-asr-2wh-pretrain1-m-0:174396:175238 [0] NCCL INFO proxy.cc:520 -> 6
liangxianchen-asr-2wh-pretrain1-m-0:174396:175238 [0] NCCL INFO proxy.cc:684 -> 6 [Proxy Thread]
[E ProcessGroupNCCL.cpp:456] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:461] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL error: remote process exited or there was a network error, NCCL version 2.14.3
ncclRemoteError: A call failed possibly due to a network error or a remote process exiting prematurely.
Last error:
Net : Connection closed by remote peer liangxianchen-asr-2wh-pretrain1-w-0.liangxianchen-asr-2wh-pretrain1.prdsafe.svc.hbox2-zzzc2-prd.local<48836>
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 174397 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 174398 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 174399 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 174400 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 174401 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 174402 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 174403 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 174396) of binary: /mnt/liangxianchen/anaconda3/envs/python38/bin/python
Traceback (most recent call last):
File "/mnt/liangxianchen/anaconda3/envs/python38/bin/torchrun", line 8, in
另外,我最开始设置batchsize=7000,在训练到300步的时候 就会报上面的错误,如果把batchsize设置为6000,在跑到1000步的时候报上面的错误,另外,我在torch.distributed.init_process_group()中增大超时参数,在batchsize设置为6000的时候,step 5000步的时候才报错
show me the full logfile.
same problem and the full logfile: tail: log.txt: file truncated W1105 15:01:33.051000 140432080086848 torch/distributed/run.py:779] W1105 15:01:33.051000 140432080086848 torch/distributed/run.py:779] ***************************************** W1105 15:01:33.051000 140432080086848 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W1105 15:01:33.051000 140432080086848 torch/distributed/run.py:779] ***************************************** Key Conformer already exists in model_classes, re-register Key Conformer already exists in model_classes, re-register Key Conformer already exists in model_classes, re-register Key Conformer already exists in model_classes, re-register Key Conformer already exists in model_classes, re-register Key Conformer already exists in model_classes, re-register Key Conformer already exists in model_classes, re-register Key Linear already exists in adaptor_classes, re-register Key Linear already exists in adaptor_classes, re-register Key Linear already exists in adaptor_classes, re-register Key Linear already exists in adaptor_classes, re-register Key Linear already exists in adaptor_classes, re-register Key Linear already exists in adaptor_classes, re-register Key Linear already exists in adaptor_classes, re-register Key TransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolutionTransformerDecoder already exists in decoder_classes, re-register Key TransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolutionTransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolutionTransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key TransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolutionTransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolutionTransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key TransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolutionTransformerDecoder already exists in decoder_classes, re-register Key TransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolutionTransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolutionTransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolutionTransformerDecoder already exists in decoder_classes, re-register Key TransformerDecoder already exists in decoder_classes, re-register Key TransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolutionTransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolutionTransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolutionTransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolutionTransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolutionTransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolution2DTransformerDecoder already exists in decoder_classes, re-register [2024-11-05 15:01:38,324][root][INFO] - download models from model hub: ms [2024-11-05 15:01:38,357][root][INFO] - use_ddp: True, use_fsdp: False [2024-11-05 15:01:38,357][root][INFO] - download models from model hub: ms [2024-11-05 15:01:38,372][root][INFO] - download models from model hub: ms [2024-11-05 15:01:38,388][root][INFO] - use_ddp: True, use_fsdp: False [2024-11-05 15:01:38,399][root][INFO] - download models from model hub: ms [2024-11-05 15:01:38,403][root][INFO] - use_ddp: True, use_fsdp: False [2024-11-05 15:01:38,420][root][INFO] - download models from model hub: ms [2024-11-05 15:01:38,427][root][INFO] - download models from model hub: ms [2024-11-05 15:01:38,429][root][INFO] - use_ddp: True, use_fsdp: False [2024-11-05 15:01:38,433][root][INFO] - download models from model hub: ms [2024-11-05 15:01:38,458][root][INFO] - use_ddp: True, use_fsdp: False
tables:
----------- ** dataset_classes ** -------------- | register name | class name | class location | | AudioDataset | AudioDataset | funasr/datasets/audio_datasets/datasets.py:9 | | AudioDatasetHotword | AudioDatasetHotword | funasr/datasets/audio_datasets/datasets.py:121 | | AudioLLMARDataset | AudioLLMARDataset | funasr/datasets/llm_datasets/datasets.py:302 | | AudioLLMDataset | AudioLLMDataset | funasr/datasets/llm_datasets/datasets.py:167 | | AudioLLMNARDataset | AudioLLMNARDataset | funasr/datasets/llm_datasets/datasets.py:8 | | AudioLLMQwenAudioDataset | AudioLLMQwenAudioDataset | funasr/datasets/llm_datasets_qwenaudio/datasets.py:8 | | AudioLLMVicunaDataset | AudioLLMVicunaDataset | funasr/datasets/llm_datasets_vicuna/datasets.py:8 | | KwsMTDataset | KwsMTDataset | funasr/datasets/kws_datasets/datasets.py:9 | | OpenAIDataset | OpenAIDataset | funasr/datasets/openai_datasets/datasets.py:10 | | OpenAIDatasetMultiTurn | OpenAIDatasetMultiTurn | funasr/datasets/openai_datasets/datasets.py:232 | | SenseVoiceCTCDataset | SenseVoiceCTCDataset | funasr/datasets/sense_voice_datasets/datasets.py:234 | | SenseVoiceDataset | SenseVoiceDataset | funasr/datasets/sense_voice_datasets/datasets.py:11 | ----------- ** batch_sampler_classes ** -------------- | register name | class name | class location | | BatchSampler | CustomDistributedBatchSampler_fn | funasr/datasets/audio_datasets/samplers.py:14 | | CustomDistributedBatchSampler | CustomDistributedBatchSampler_fn | funasr/datasets/audio_datasets/samplers.py:14 | | CustomDistributedDynamicBatchSampler | CustomDistributedBatchSampler_fn | funasr/datasets/audio_datasets/samplers.py:14 | | DynamicBatchLocalShuffleSampler | CustomDistributedBatchSampler_fn | funasr/datasets/audio_datasets/samplers.py:14 | | EspnetStyleBatchSampler | EspnetStyleBatchSampler_fn | funasr/datasets/audio_datasets/espnet_samplers.py:13 | | RankFullLocalShuffleBatchSampler | CustomDistributedBatchSampler_fn | funasr/datasets/audio_datasets/samplers.py:14 | | RankFullLocalShuffleDynamicBatchSampler | CustomDistributedBatchSampler_fn | funasr/datasets/audio_datasets/samplers.py:14 | ----------- ** index_ds_classes ** -------------- | register name | class name | class location | | IndexDSJsonl | IndexDSJsonlRankFull | funasr/datasets/audio_datasets/index_ds.py:13 | | IndexDSJsonlRankFull | IndexDSJsonlRankFull | funasr/datasets/audio_datasets/index_ds.py:13 | | IndexDSJsonlRankSplit | IndexDSJsonlRankFull | funasr/datasets/audio_datasets/index_ds.py:13 | | OpenAIIndexDSJsonl | OpenAIIndexDSJsonl | funasr/datasets/openai_datasets/index_ds.py:13 | ----------- ** preprocessor_classes ** -------------- | register name | class name | class location | | SpeechPreprocessSpeedPerturb | SpeechPreprocessSpeedPerturb | funasr/datasets/audio_datasets/preprocessor.py:18 | | TextPreprocessRemovePunctuation | TextPreprocessRemovePunctuation | funasr/datasets/llm_datasets/preprocessor.py:19 | | TextPreprocessSegDict | TextPreprocessSegDict | funasr/datasets/audio_datasets/preprocessor.py:39 | ----------- ** dataloader_classes ** -------------- | register name | class name | class location | | DataloaderIterable | DataloaderIterable | funasr/datasets/dataloader_entry.py:120 | | DataloaderMapStyle | DataloaderMapStyle | funasr/datasets/dataloader_entry.py:47 | ----------- ** frontend_classes ** -------------- | register name | class name | class location | | DefaultFrontend | DefaultFrontend | funasr/frontends/default.py:22 | | EspnetFrontend | DefaultFrontend | funasr/frontends/default.py:22 | | WavFrontend | WavFrontend | funasr/frontends/wav_frontend.py:78 | | WavFrontendOnline | WavFrontendOnline | funasr/frontends/wav_frontend.py:212 | | WhisperFrontend | WhisperFrontend | funasr/frontends/whisper_frontend.py:10 | | wav_frontend | WavFrontend | funasr/frontends/wav_frontend.py:78 | ----------- ** joint_network_classes ** -------------- | register name | class name | class location | | joint_network | JointNetwork | funasr/models/transducer/joint_network.py:12 | ----------- ** model_classes ** -------------- | register name | class name | class location | | BAT | BAT | funasr/models/bat/model.py:35 | | BiCifParaformer | BiCifParaformer | funasr/models/bicif_paraformer/model.py:37 | | Branchformer | Branchformer | funasr/models/branchformer/model.py:7 | | CAMPPlus | CAMPPlus | funasr/models/campplus/model.py:37 | | CTC | Transformer | funasr/models/ctc/model.py:17 | | CTTransformer | CTTransformer | funasr/models/ct_transformer/model.py:34 | | CTTransformerStreaming | CTTransformerStreaming | funasr/models/ct_transformer_streaming/model.py:27 | | Conformer | Conformer | funasr/models/conformer_rwkv/model.py:9 | | ContextualParaformer | ContextualParaformer | funasr/models/contextual_paraformer/model.py:40 | | EBranchformer | EBranchformer | funasr/models/e_branchformer/model.py:7 | | Emotion2vec | Emotion2vec | funasr/models/emotion2vec/model.py:34 | | FsmnKWS | FsmnKWS | funasr/models/fsmn_kws/model.py:26 | | FsmnKWSConvert | FsmnKWSConvert | funasr/models/fsmn_kws/model.py:240 | | FsmnKWSMT | FsmnKWSMT | funasr/models/fsmn_kws_mt/model.py:26 | | FsmnKWSMTConvert | FsmnKWSMTConvert | funasr/models/fsmn_kws_mt/model.py:302 | | FsmnVADStreaming | FsmnVADStreaming | funasr/models/fsmn_vad_streaming/model.py:280 | | LCBNet | LCBNet | funasr/models/lcbnet/model.py:27 | | LLMASR | LLMASR | funasr/models/llm_asr/model.py:27 | | LLMASR2 | LLMASR2 | funasr/models/llm_asr/model.py:348 | | LLMASR3 | LLMASR3 | funasr/models/llm_asr/model.py:829 | | LLMASR4 | LLMASR4 | funasr/models/llm_asr/model.py:847 | | LLMASRNAR | LLMASRNAR | funasr/models/llm_asr_nar/model.py:25 | | LLMASRNARPrompt | LLMASRNARPrompt | funasr/models/llm_asr_nar/model.py:370 | | MonotonicAligner | MonotonicAligner | funasr/models/monotonic_aligner/model.py:24 | | OpenAIWhisperLIDModel | OpenAIWhisperLIDModel | funasr/models/whisper_lid/model.py:457 | | OpenAIWhisperModel | OpenAIWhisperModel | funasr/models/whisper_lid/model.py:21 | | Paraformer | Paraformer | funasr/models/paraformer/model.py:29 | | ParaformerStreaming | ParaformerStreaming | funasr/models/paraformer_streaming/model.py:37 | | Qwen-Audio | QwenAudioWarp | funasr/models/qwen_audio/model.py:17 | | Qwen-Audio-Chat | QwenAudioChatWarp | funasr/models/qwen_audio/model.py:82 | | Qwen/Qwen-Audio | QwenAudioWarp | funasr/models/qwen_audio/model.py:17 | | Qwen/Qwen-Audio-Chat | QwenAudioChatWarp | funasr/models/qwen_audio/model.py:82 | | Qwen/QwenAudio | QwenAudioWarp | funasr/models/qwen_audio/model.py:17 | | Qwen/QwenAudioChat | QwenAudioChatWarp | funasr/models/qwen_audio/model.py:82 | | QwenAudio | QwenAudioWarp | funasr/models/qwen_audio/model.py:17 | | QwenAudioChat | QwenAudioChatWarp | funasr/models/qwen_audio/model.py:82 | | QwenAudioChatWarp | QwenAudioChatWarp | funasr/models/qwen_audio/model.py:82 | | QwenAudioWarp | QwenAudioWarp | funasr/models/qwen_audio/model.py:17 | | SANM | SANM | funasr/models/sanm/model.py:14 | | SCAMA | SCAMA | funasr/models/scama/model.py:39 | | SanmKWS | SanmKWS | funasr/models/sanm_kws/model.py:27 | | SanmKWSStreaming | SanmKWSStreaming | funasr/models/sanm_kws_streaming/model.py:37 | | SeacoParaformer | SeacoParaformer | funasr/models/seaco_paraformer/model.py:43 | | SenseVoiceSmall | SenseVoiceSmall | funasr/models/sense_voice/model.py:587 | | Transducer | Transducer | funasr/models/transducer/model.py:34 | | Transformer | Transformer | funasr/models/transformer/model.py:22 | | UniASR | UniASR | funasr/models/uniasr/model.py:26 | | Whisper-base | WhisperWarp | funasr/models/whisper/model.py:20 | | Whisper-base.en | WhisperWarp | funasr/models/whisper/model.py:20 | | Whisper-large-v1 | WhisperWarp | funasr/models/whisper/model.py:20 | | Whisper-large-v2 | WhisperWarp | funasr/models/whisper/model.py:20 | | Whisper-large-v3 | WhisperWarp | funasr/models/whisper/model.py:20 | | Whisper-large-v3-turbo | WhisperWarp | funasr/models/whisper/model.py:20 | | Whisper-medium | WhisperWarp | funasr/models/whisper/model.py:20 | | Whisper-medium.en | WhisperWarp | funasr/models/whisper/model.py:20 | | Whisper-small | WhisperWarp | funasr/models/whisper/model.py:20 | | Whisper-small.en | WhisperWarp | funasr/models/whisper/model.py:20 | | Whisper-tiny | WhisperWarp | funasr/models/whisper/model.py:20 | | Whisper-tiny.en | WhisperWarp | funasr/models/whisper/model.py:20 | | WhisperWarp | WhisperWarp | funasr/models/whisper/model.py:20 | ----------- ** predictor_classes ** -------------- | register name | class name | class location | | CifPredictor | CifPredictor | funasr/models/paraformer/cif_predictor.py:16 | | CifPredictorV2 | CifPredictorV2 | funasr/models/paraformer/cif_predictor.py:172 | | CifPredictorV2Export | CifPredictorV2Export | funasr/models/paraformer/cif_predictor.py:430 | | CifPredictorV3 | CifPredictorV3 | funasr/models/bicif_paraformer/cif_predictor.py:96 | | CifPredictorV3Export | CifPredictorV3Export | funasr/models/bicif_paraformer/cif_predictor.py:374 | ----------- ** encoder_classes ** -------------- | register name | class name | class location | | BranchformerEncoder | BranchformerEncoder | funasr/models/branchformer/encoder.py:278 | | ChunkConformerEncoder | ConformerChunkEncoder | funasr/models/conformer/encoder.py:884 | | ConformerEncoder | ConformerEncoder | funasr/models/conformer/encoder.py:286 | | ConvBiasPredictor | ConvPredictor | funasr/models/lcbnet/encoder.py:357 | | EBranchformerEncoder | EBranchformerEncoder | funasr/models/e_branchformer/encoder.py:179 | | FSMN | FSMN | funasr/models/fsmn_vad_streaming/encoder.py:199 | | FSMNConvert | FSMNConvert | funasr/models/fsmn_kws/encoder.py:422 | | FSMNExport | FSMNExport | funasr/models/fsmn_vad_streaming/encoder.py:274 | | FSMNMT | FSMNMT | funasr/models/fsmn_kws_mt/encoder.py:27 | | FSMNMTConvert | FSMNMTConvert | funasr/models/fsmn_kws_mt/encoder.py:106 | | FusionSANEncoder | SelfSrcAttention | funasr/models/lcbnet/encoder.py:228 | | OpenAIWhisperEncoderWarp | OpenAIWhisperEncoderWarp | funasr/models/whisper_lid/encoder.py:17 | | QwenAudioEncoder | QwenAudioEncoder | funasr/models/qwen_audio/audio.py:333 | | RWKVEncoder | RWKVEncoder | funasr/models/rwkv_bat/rwkv_encoder.py:16 | | SANMEncoder | SANMEncoder | funasr/models/sanm/encoder.py:187 | | SANMEncoderChunkOpt | SANMEncoderChunkOpt | funasr/models/scama/encoder.py:187 | | SANMEncoderChunkOptExport | SANMEncoderExport | funasr/models/sanm/encoder.py:516 | | SANMEncoderExport | SANMEncoderExport | funasr/models/sanm/encoder.py:516 | | SANMVadEncoder | SANMVadEncoder | funasr/models/ct_transformer_streaming/encoder.py:174 | | SANMVadEncoderExport | SANMVadEncoderExport | funasr/models/ct_transformer_streaming/encoder.py:436 | | SenseVoiceEncoderSmall | SenseVoiceEncoderSmall | funasr/models/sense_voice/model.py:443 | | TransformerEncoder | TransformerEncoder | funasr/models/transformer/encoder.py:139 | | TransformerTextEncoder | TransformerTextEncoder | funasr/models/lcbnet/encoder.py:130 | ----------- ** decoder_classes ** -------------- | register name | class name | class location | | ContextualParaformerDecoder | ContextualParaformerDecoder | funasr/models/contextual_paraformer/decoder.py:114 | | ContextualParaformerDecoderExport | ContextualParaformerDecoderExport | funasr/models/contextual_paraformer/decoder.py:315 | | DynamicConvolution2DTransformerDecoder | DynamicConvolution2DTransformerDecoder | funasr/models/sa_asr/transformer_decoder.py:674 | | DynamicConvolutionTransformerDecoder | DynamicConvolutionTransformerDecoder | funasr/models/sa_asr/transformer_decoder.py:614 | | FsmnDecoder | FsmnDecoder | funasr/models/sanm/decoder.py:203 | | FsmnDecoderSCAMAOpt | FsmnDecoderSCAMAOpt | funasr/models/scama/decoder.py:203 | | LightweightConvolution2DTransformerDecoder | LightweightConvolution2DTransformerDecoder | funasr/models/sa_asr/transformer_decoder.py:554 | | LightweightConvolutionTransformerDecoder | LightweightConvolutionTransformerDecoder | funasr/models/sa_asr/transformer_decoder.py:494 | | OpenAIWhisperDecoderWarp | OpenAIWhisperDecoderWarp | funasr/models/whisper_lid/decoder.py:15 | | ParaformerDecoderSAN | ParaformerDecoderSAN | funasr/models/sa_asr/transformer_decoder.py:388 | | ParaformerDecoderSANExport | ParaformerDecoderSANExport | funasr/models/paraformer/decoder.py:1087 | | ParaformerSANDecoder | ParaformerSANDecoder | funasr/models/paraformer/decoder.py:981 | | ParaformerSANMDecoder | ParaformerSANMDecoder | funasr/models/paraformer/decoder.py:224 | | ParaformerSANMDecoderExport | ParaformerSANMDecoderExport | funasr/models/paraformer/decoder.py:640 | | ParaformerSANMDecoderOnlineExport | ParaformerSANMDecoderOnlineExport | funasr/models/paraformer/decoder.py:829 | | TransformerDecoder | TransformerDecoder | funasr/models/sa_asr/transformer_decoder.py:343 | | TransformerRWKVDecoder | TransformerRWKVDecoder | funasr/models/conformer_rwkv/decoder.py:378 | | rnn_decoder | RNNDecoder | funasr/models/transducer/rnn_decoder.py:85 | | rnnt_decoder | RNNTDecoder | funasr/models/transducer/rnnt_decoder.py:14 | ----------- ** adaptor_classes ** -------------- | register name | class name | class location | | Linear | Linear | funasr/models/llm_asr_nar/adaptor.py:7 | | QFormer | EncoderProjectorQFormer | funasr/models/llm_asr/adaptor.py:35 | | Transformer | Transformer | funasr/models/llm_asr/adaptor.py:92 | ----------- ** normalize_classes ** -------------- | register name | class name | class location | | GlobalMVN | GlobalMVN | funasr/models/normalize/global_mvn.py:12 | | UtteranceMVN | UtteranceMVN | funasr/models/normalize/utterance_mvn.py:9 | ----------- ** specaug_classes ** -------------- | register name | class name | class location | | SpecAug | SpecAug | funasr/models/specaug/specaug.py:16 | | SpecAugLFR | SpecAugLFR | funasr/models/specaug/specaug.py:105 | ----------- ** lid_predictor_classes ** -------------- | register name | class name | class location | | LidPredictor | LidPredictor | funasr/models/whisper_lid/lid_predictor.py:9 | ----------- ** tokenizer_classes ** -------------- | register name | class name | class location | | CharTokenizer | CharTokenizer | funasr/tokenizer/char_tokenizer.py:12 | | HuggingfaceTokenizer | HuggingfaceTokenizer | funasr/tokenizer/hf_tokenizer.py:4 | | SenseVoiceTokenizer | SenseVoiceTokenizer | funasr/tokenizer/whisper_tokenizer.py:25 | | SentencepiecesTokenizer | SentencepiecesTokenizer | funasr/tokenizer/sentencepiece_tokenizer.py:12 | | WhisperTokenizer | WhisperTokenizer | funasr/tokenizer/whisper_tokenizer.py:4 |
[2024-11-05 15:01:38,459][root][INFO] - use_ddp: True, use_fsdp: False
[2024-11-05 15:01:38,464][root][INFO] - use_ddp: True, use_fsdp: False
[2024-11-05 15:01:38,759][root][INFO] - Build model, frontend, tokenizer
funasr version: 1.1.14.
Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel
[2024-11-05 15:01:38,885][root][INFO] - Build model, frontend, tokenizer
funasr version: 1.1.14.
Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel
[2024-11-05 15:01:38,958][root][INFO] - Build model, frontend, tokenizer
funasr version: 1.1.14.
Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel
[2024-11-05 15:01:39,074][root][INFO] - Build model, frontend, tokenizer
funasr version: 1.1.14.
Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel
[2024-11-05 15:01:39,091][root][INFO] - Build model, frontend, tokenizer
funasr version: 1.1.14.
Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel
[2024-11-05 15:01:39,123][root][INFO] - Build model, frontend, tokenizer
funasr version: 1.1.14.
Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel
[2024-11-05 15:01:39,148][root][INFO] - Build model, frontend, tokenizer
funasr version: 1.1.14.
Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel
You are using the latest version of funasr-1.1.14
You are using the latest version of funasr-1.1.14
You are using the latest version of funasr-1.1.14
You are using the latest version of funasr-1.1.14
You are using the latest version of funasr-1.1.14
You are using the latest version of funasr-1.1.14
[2024-11-05 15:01:41,831][root][INFO] - Loading pretrained params from /nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
[2024-11-05 15:01:41,837][root][INFO] - ckpt: /nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
/nvme2/chaoshan/FunASR/funasr/train_utils/load_pretrained_model.py:39: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ori_state = torch.load(path, map_location=map_location)
Error executing job with overrides: ['++model=/nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch', '++train_data_set_list=/nvme2/chaoshanData/1//train_new.jsonl', '++valid_data_set_list=/nvme2/chaoshanData/1//val.jsonl', '++dataset=AudioDataset', '++dataset_conf.index_ds=IndexDSJsonl', '++dataset_conf.data_split_num=1', '++dataset_conf.batch_sampler=BatchSampler', '++dataset_conf.batch_size=2000', '++dataset_conf.sort_size=1024', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=4', '++train_conf.max_epoch=50', '++train_conf.log_interval=1', '++train_conf.resume=true', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.keep_nbest_models=20', '++train_conf.avg_nbest_model=10', '++train_conf.use_deepspeed=false', '++train_conf.deepspeed_config=/nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/../../ds_stage1.json', '++optim_conf.lr=0.0002', '++output_dir=./output']
[rank0]: Traceback (most recent call last):
[rank0]: File "/nvme2/chaoshan/FunASR/funasr/bin/train_ds.py", line 225, in
[rank0]: main_hydra()
[rank0]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
[rank0]: _run_hydra(
[rank0]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
[rank0]: _run_app(
[rank0]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
[rank0]: run_and_report(
[rank0]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
[rank0]: raise ex
[rank0]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
[rank0]: return func()
[rank0]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in
[rank0]: lambda: hydra.run(
[rank0]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
[rank0]: _ = ret.return_value
[rank0]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
[rank0]: raise self._return_value
[rank0]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
[rank0]: ret.return_value = task_function(task_cfg)
[rank0]: File "/nvme2/chaoshan/FunASR/funasr/bin/train_ds.py", line 56, in main_hydra
[rank0]: main(**kwargs)
[rank0]: File "/nvme2/chaoshan/FunASR/funasr/bin/train_ds.py", line 95, in main
[rank0]: model = AutoModel(**kwargs)
[rank0]: File "/nvme2/chaoshan/FunASR/funasr/auto/auto_model.py", line 125, in init
[rank0]: model, kwargs = self.build_model(**kwargs)
[rank0]: File "/nvme2/chaoshan/FunASR/funasr/auto/auto_model.py", line 270, in build_model
[rank0]: load_pretrained_model(
[rank0]: File "/nvme2/chaoshan/FunASR/funasr/train_utils/load_pretrained_model.py", line 39, in load_pretrained_model
[rank0]: ori_state = torch.load(path, map_location=map_location)
[rank0]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/torch/serialization.py", line 1114, in load
[rank0]: return _legacy_load(
[rank0]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/torch/serialization.py", line 1338, in _legacy_load
[rank0]: magic_number = pickle_module.load(f, **pickle_load_args)
[rank0]: _pickle.UnpicklingError: invalid load key, 'v'.
[rank0]:[W1105 15:01:42.300788901 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
[2024-11-05 15:01:42,408][root][INFO] - Loading pretrained params from /nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
[2024-11-05 15:01:42,414][root][INFO] - ckpt: /nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
/nvme2/chaoshan/FunASR/funasr/train_utils/load_pretrained_model.py:39: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ori_state = torch.load(path, map_location=map_location)
Error executing job with overrides: ['++model=/nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch', '++train_data_set_list=/nvme2/chaoshanData/1//train_new.jsonl', '++valid_data_set_list=/nvme2/chaoshanData/1//val.jsonl', '++dataset=AudioDataset', '++dataset_conf.index_ds=IndexDSJsonl', '++dataset_conf.data_split_num=1', '++dataset_conf.batch_sampler=BatchSampler', '++dataset_conf.batch_size=2000', '++dataset_conf.sort_size=1024', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=4', '++train_conf.max_epoch=50', '++train_conf.log_interval=1', '++train_conf.resume=true', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.keep_nbest_models=20', '++train_conf.avg_nbest_model=10', '++train_conf.use_deepspeed=false', '++train_conf.deepspeed_config=/nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/../../ds_stage1.json', '++optim_conf.lr=0.0002', '++output_dir=./output']
[2024-11-05 15:01:42,415][root][INFO] - Loading pretrained params from /nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
[rank6]: Traceback (most recent call last):
[rank6]: File "/nvme2/chaoshan/FunASR/funasr/bin/train_ds.py", line 225, in
[rank6]: main_hydra()
[rank6]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
[rank6]: _run_hydra(
[rank6]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
[rank6]: _run_app(
[rank6]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
[rank6]: run_and_report(
[rank6]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
[rank6]: raise ex
[rank6]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
[rank6]: return func()
[rank6]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in
[rank6]: lambda: hydra.run(
[rank6]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
[rank6]: _ = ret.return_value
[rank6]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
[rank6]: raise self._return_value
[rank6]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
[rank6]: ret.return_value = task_function(task_cfg)
[rank6]: File "/nvme2/chaoshan/FunASR/funasr/bin/train_ds.py", line 56, in main_hydra
[rank6]: main(**kwargs)
[rank6]: File "/nvme2/chaoshan/FunASR/funasr/bin/train_ds.py", line 95, in main
[rank6]: model = AutoModel(**kwargs)
[rank6]: File "/nvme2/chaoshan/FunASR/funasr/auto/auto_model.py", line 125, in init
[rank6]: model, kwargs = self.build_model(**kwargs)
[rank6]: File "/nvme2/chaoshan/FunASR/funasr/auto/auto_model.py", line 270, in build_model
[rank6]: load_pretrained_model(
[rank6]: File "/nvme2/chaoshan/FunASR/funasr/train_utils/load_pretrained_model.py", line 39, in load_pretrained_model
[rank6]: ori_state = torch.load(path, map_location=map_location)
[rank6]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/torch/serialization.py", line 1114, in load
[rank6]: return _legacy_load(
[rank6]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/torch/serialization.py", line 1338, in _legacy_load
[rank6]: magic_number = pickle_module.load(f, **pickle_load_args)
[rank6]: _pickle.UnpicklingError: invalid load key, 'v'.
[2024-11-05 15:01:42,421][root][INFO] - ckpt: /nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
/nvme2/chaoshan/FunASR/funasr/train_utils/load_pretrained_model.py:39: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ori_state = torch.load(path, map_location=map_location)
Error executing job with overrides: ['++model=/nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch', '++train_data_set_list=/nvme2/chaoshanData/1//train_new.jsonl', '++valid_data_set_list=/nvme2/chaoshanData/1//val.jsonl', '++dataset=AudioDataset', '++dataset_conf.index_ds=IndexDSJsonl', '++dataset_conf.data_split_num=1', '++dataset_conf.batch_sampler=BatchSampler', '++dataset_conf.batch_size=2000', '++dataset_conf.sort_size=1024', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=4', '++train_conf.max_epoch=50', '++train_conf.log_interval=1', '++train_conf.resume=true', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.keep_nbest_models=20', '++train_conf.avg_nbest_model=10', '++train_conf.use_deepspeed=false', '++train_conf.deepspeed_config=/nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/../../ds_stage1.json', '++optim_conf.lr=0.0002', '++output_dir=./output']
[rank3]: Traceback (most recent call last):
[rank3]: File "/nvme2/chaoshan/FunASR/funasr/bin/train_ds.py", line 225, in
[rank3]: main_hydra()
[rank3]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
[rank3]: _run_hydra(
[rank3]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
[rank3]: _run_app(
[rank3]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
[rank3]: run_and_report(
[rank3]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
[rank3]: raise ex
[rank3]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
[rank3]: return func()
[rank3]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in
[rank3]: lambda: hydra.run(
[rank3]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
[rank3]: _ = ret.return_value
[rank3]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
[rank3]: raise self._return_value
[rank3]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
[rank3]: ret.return_value = task_function(task_cfg)
[rank3]: File "/nvme2/chaoshan/FunASR/funasr/bin/train_ds.py", line 56, in main_hydra
[rank3]: main(**kwargs)
[rank3]: File "/nvme2/chaoshan/FunASR/funasr/bin/train_ds.py", line 95, in main
[rank3]: model = AutoModel(**kwargs)
[rank3]: File "/nvme2/chaoshan/FunASR/funasr/auto/auto_model.py", line 125, in init
[rank3]: model, kwargs = self.build_model(**kwargs)
[rank3]: File "/nvme2/chaoshan/FunASR/funasr/auto/auto_model.py", line 270, in build_model
[rank3]: load_pretrained_model(
[rank3]: File "/nvme2/chaoshan/FunASR/funasr/train_utils/load_pretrained_model.py", line 39, in load_pretrained_model
[rank3]: ori_state = torch.load(path, map_location=map_location)
[rank3]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/torch/serialization.py", line 1114, in load
[rank3]: return _legacy_load(
[rank3]: File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/torch/serialization.py", line 1338, in _legacy_load
[rank3]: magic_number = pickle_module.load(f, **pickle_load_args)
[rank3]: _pickle.UnpicklingError: invalid load key, 'v'.
W1105 15:01:42.683000 140432080086848 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 2610403 closing signal SIGTERM
W1105 15:01:42.684000 140432080086848 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 2610404 closing signal SIGTERM
W1105 15:01:42.684000 140432080086848 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 2610405 closing signal SIGTERM
W1105 15:01:42.684000 140432080086848 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 2610406 closing signal SIGTERM
W1105 15:01:42.684000 140432080086848 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 2610407 closing signal SIGTERM
W1105 15:01:42.685000 140432080086848 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 2610408 closing signal SIGTERM
E1105 15:01:43.013000 140432080086848 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 2610402) of binary: /home/tx/anaconda3/envs/mt/bin/python
Traceback (most recent call last):
File "/home/tx/anaconda3/envs/mt/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==2.4.1', 'console_scripts', 'torchrun')())
File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper
return f(*args, **kwargs)
File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/torch/distributed/run.py", line 901, in main
run(args)
File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/tx/anaconda3/envs/mt/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ori_state = torch.load(path, map_location=map_location)
Error executing job with overrides: ['++model=/nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch', '++train_data_set_list=/nvme2/chaoshanData/1//train_new.jsonl', '++valid_data_set_list=/nvme2/chaoshanData/1//val.jsonl', '++dataset=AudioDataset', '++dataset_conf.index_ds=IndexDSJsonl', '++dataset_conf.data_split_num=1', '++dataset_conf.batch_sampler=BatchSampler', '++dataset_conf.batch_size=2000', '++dataset_conf.sort_size=1024', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=4', '++train_conf.max_epoch=50', '++train_conf.log_interval=1', '++train_conf.resume=true', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.keep_nbest_models=20', '++train_conf.avg_nbest_model=10', '++train_conf.use_deepspeed=false', '++train_conf.deepspeed_config=/nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/../../ds_stage1.json', '++optim_conf.lr=0.0002', '++output_dir=./output']
[2024-11-05 15:01:42,415][root][INFO] - Loading pretrained params from /nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
[rank6]: Traceback (most recent call last):
[rank6]: File "/nvme2/chaoshan/FunASR/funasr/bin/train_ds.py", line 225, in torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ori_state = torch.load(path, map_location=map_location)
Error executing job with overrides: ['++model=/nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch', '++train_data_set_list=/nvme2/chaoshanData/1//train_new.jsonl', '++valid_data_set_list=/nvme2/chaoshanData/1//val.jsonl', '++dataset=AudioDataset', '++dataset_conf.index_ds=IndexDSJsonl', '++dataset_conf.data_split_num=1', '++dataset_conf.batch_sampler=BatchSampler', '++dataset_conf.batch_size=2000', '++dataset_conf.sort_size=1024', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=4', '++train_conf.max_epoch=50', '++train_conf.log_interval=1', '++train_conf.resume=true', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.keep_nbest_models=20', '++train_conf.avg_nbest_model=10', '++train_conf.use_deepspeed=false', '++train_conf.deepspeed_config=/nvme2/chaoshan/FunASR/examples/industrial_data_pretraining/paraformer/../../ds_stage1.json', '++optim_conf.lr=0.0002', '++output_dir=./output']
[rank3]: Traceback (most recent call last):
[rank3]: File "/nvme2/chaoshan/FunASR/funasr/bin/train_ds.py", line 225, in /nvme2/chaoshan/FunASR/funasr/bin/train_ds.py FAILED
Failures: <NO_OTHER_FAILURES>
Root Cause (first observed failure): [0]: time : 2024-11-05_15:01:42 host : ubuntu rank : 0 (local_rank: 0) exitcode : 1 (pid: 2610402) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Im also getting an error
$bash extract_features.sh
Namespace(checkpoint_dir='/home/ajay/speech_emotion/funasr/emotion2vec_base/emotion2vec_base.pt', granularity='utterance', model_dir='/home/ajay/speech_emotion/funasr/emotion2vec_base', source_file='/home/ajay/speech_emotion/funasr/emotion2vec/scripts/test.wav', target_file='/home/ajay/speech_emotion/funasr/emotion2vec/code/emotion2vec/scripts/test.npy')
/home/ajay/speech_emotion/emotion-recognition-using-speech/venv/lib/python3.8/site-packages/fairseq/checkpoint_utils.py:315: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state = torch.load(f, map_location=torch.device("cpu"))
Traceback (most recent call last):
File "extract_features.py", line 70, in
Can someone let me know how to do inference?? I am a newbie
Thanks in advance Ajay