LiuXin

Results 4 comments of LiuXin

我也碰上了跟题主一样的报错,请问有人解决了modelscope多卡训练的问题吗,还是说是环境问题 Task related config: error: unrecognized arguments: --local-rank=0 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 185461) of binary: /opt/conda/envs/modelscope/bin/python Traceback (most recent call last): File "/opt/conda/envs/modelscope/lib/python3.8/runpy.py", line 194, in _run_module_as_main return...

I also have the same questions , do you have a solution ?Or does this have something to do with the long-term loading of collections when I deploy the interface?

> # 1. Construct the dataset > ``` > train.jsonl (each line): {"query_id": "111", "query": "吃饭的猫猫1", "image_id": "222", "image": "/path/to/cat_1.jpg"} > validation.jsonl (each line): {"query_id": "333", "query": "吃饭的猫猫2", "image_id": "444",...

> Please check training data,format reference (https://alibaba-damo-academy.github.io/FunASR/en/egs_modelscope/asr/TEMPLATE/README.html#finetune-with-your-data) 您好,我单卡训练没问题,但是多卡训练报错了,我的启动命令是CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --nproc_per_node 2 finetune.py 报错如下: Task related config: error: unrecognized arguments: --local-rank=0 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 185479) of...