🐛 Bug:按照教程fineturn 模型报错:forward() missing 4 required positional arguments: 'speech', 'speech_lengths', 'text', and 'text_lengths'
To Reproduce
按照教程https://github.com/alibaba-damo-academy/FunASR/blob/main/examples/industrial_data_pretraining/paraformer/README_zh.md
fineturn 模型步骤:
-
cd examples/industrial_data_pretraining/paraformer
-
sh train_from_local.sh
-
train.jsonl 和 val.jsonl 已生成
然后报错:TypeError: forward() missing 4 required positional arguments: 'speech', 'speech_lengths', 'text', and 'text_lengths'
看起来像是数据没有加载进去,但是train.jsonl 和 val.jsonl里面路径已经生成好
train.jsonl :
{"key": "BAC009S0764W0121", "source": "/home/shawn/workspace/work/Pythonproject/FunASR/data/wav/BAC009S0764W0121.wav", "source_len": 420, "target": "甚至出现交易几乎停滞的情况", "target_len": 13}
{"key": "BAC009S0916W0489", "source": "/home/shawn/workspace/work/Pythonproject/FunASR/data/wav/BAC009S0916W0489.wav", "source_len": 573, "target": "湖北一公司以员工名义贷款数十员工负债千万", "target_len": 20}
{"key": "asr_example_cn_en", "source": "/home/shawn/workspace/work/Pythonproject/FunASR/data/wav/asr_example_cn_en.wav", "source_len": 1474, "target": "所有只要处理 data 不管你是做 machine learning 做 deep learning 做 data analytics 做 data science 也好 scientist 也好通通都要都做的基本功啊那 again 先先对有一些也许对", "target_len": 19}
{"key": "ID0012W0014", "source": "/home/shawn/workspace/work/Pythonproject/FunASR/data/wav/asr_example_en(1).wav", "source_len": 222, "target": "he tried to think how it could be", "target_len": 8}
Environment
- OS : ubuntu
- FunASR Version : 1.0.22
- ModelScope Version :1.13.3
- PyTorch Version :2.0.1
- How you installed funasr :source
- Python version:3.8
- GPU 3070ti
- CUDA/cuDNN version :cuda11.7
- Docker version 无
- 模型 iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
- 数据 https://github.com/alibaba-damo-academy/FunASR/blob/main/data/list (wav文件已下载下来)
报错信息
[2024-04-09 17:37:19,356][root][INFO] - Build optim
[2024-04-09 17:37:19,361][root][INFO] - Build scheduler
[2024-04-09 17:37:19,362][root][INFO] - Build dataloader
[2024-04-09 17:37:19,362][root][INFO] - Build dataloader
[2024-04-09 17:37:19,363][root][INFO] - total_num of samplers across ranks: 4
[2024-04-09 17:37:19,363][root][INFO] - total_num of samplers across ranks: 2
[2024-04-09 17:37:19,363][root][WARNING] - distributed is not initialized, only single shard
[2024-04-09 17:37:19,381][root][INFO] - Train epoch: 0, rank: 0
Error executing job with overrides: ['++train_data_set_list=../../../data/list/train.jsonl', '++valid_data_set_list=../../../data/list/val.jsonl', '++dataset_conf.batch_size=2', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=4', '++train_conf.max_epoch=50', '++train_conf.log_interval=10', '++train_conf.resume=false', '++train_conf.validate_interval=15', '++train_conf.save_checkpoint_interval=15', '++train_conf.keep_nbest_models=50', '++optim_conf.lr=0.0002', '++init_param=/home/shawn/workspace/work/Pythonproject/FunASR/examples/industrial_data_pretraining/paraformer/modelscope_models/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt', '++tokenizer_conf.token_list=/home/shawn/workspace/work/Pythonproject/FunASR/examples/industrial_data_pretraining/paraformer/modelscope_models/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/tokens.json', '++frontend_conf.cmvn_file=/home/shawn/workspace/work/Pythonproject/FunASR/examples/industrial_data_pretraining/paraformer/modelscope_models/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/am.mvn', '++output_dir=./outputs']
Traceback (most recent call last):
File "../../../funasr/bin/train.py", line 225, in
main_hydra()
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "../../../funasr/bin/train.py", line 48, in main_hydra
main(**kwargs)
File "../../../funasr/bin/train.py", line 185, in main
trainer.train_epoch(
File "/home/shawn/workspace/work/Pythonproject/FunASR/funasr/train_utils/trainer.py", line 290, in train_epoch
retval = model(**batch)
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
TypeError: forward() missing 4 required positional arguments: 'speech', 'speech_lengths', 'text', and 'text_lengths'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 18773) of binary: /home/shawn/anaconda3/envs/funasr_fineture/bin/python
Traceback (most recent call last):
File "/home/shawn/anaconda3/envs/funasr_fineture/bin/torchrun", line 8, in
sys.exit(main())
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/shawn/anaconda3/envs/funasr_fineture/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
../../../funasr/bin/train.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2024-04-09_17:37:24
host : shawn-desktop
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 18773)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html