LiuXin comments

Results 4 comments of


                                            LiuXin

多卡训练报错

我也碰上了跟题主一样的报错，请问有人解决了modelscope多卡训练的问题吗，还是说是环境问题 Task related config: error: unrecognized arguments: --local-rank=0 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 185461) of binary: /opt/conda/envs/modelscope/bin/python Traceback (most recent call last): File "/opt/conda/envs/modelscope/lib/python3.8/runpy.py", line 194, in _run_module_as_main return...

[Bug]: In milvus-standalone docker container. Issue related to `goroutine`, automaticly exits with code `134` after sometime. If no collection is loaded, no problem is caused.

I also have the same questions , do you have a solution ？Or does this have something to do with the long-term loading of collections when I deploy the interface?

现在我想用自己的本地数据集微调clip，请问我需要如何在本地构造数据然后加载本地数据集训练啊

> # 1. Construct the dataset > ``` > train.jsonl (each line): {"query_id": "111", "query": "吃饭的猫猫1", "image_id": "222", "image": "/path/to/cat_1.jpg"} > validation.jsonl (each line): {"query_id": "333", "query": "吃饭的猫猫2", "image_id": "444",...

基于damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch模型微调,运行finetune.py报错

> Please check training data，format reference (https://alibaba-damo-academy.github.io/FunASR/en/egs_modelscope/asr/TEMPLATE/README.html#finetune-with-your-data) 您好，我单卡训练没问题，但是多卡训练报错了，我的启动命令是CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --nproc_per_node 2 finetune.py 报错如下： Task related config: error: unrecognized arguments: --local-rank=0 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 185479) of...