hx358031364
hx358031364
AMP not enabled. Training in float32. Using native Torch DistributedDataParallel. Scheduled epochs: 310 /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [15,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out...
` model = keras.models.Model([model_body.input, *y_true], loss_list) parallel_model = multi_gpu_model(model, 2)` 添加了multi_gpu_model,将后面所有model换成parallel_model,在保存模型的时候用model.save,但是不起作用,还是只是一张卡
在执行多GPU操作的时候报了chunk expects at least a 1-dimensional tensor,我输入的命令为python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --data-path ./CocoDataset --dataset coco --num-classes 103 --batch-size 64 --epochs 50
` cache_path = (p if p.is_file() else Path(self.label_files[0]).parent).with_suffix('.cache') # cached labels if cache_path.is_file(): cache, exists = torch.load(cache_path), True # load #if cache['hash'] != get_hash(self.label_files + self.img_files) or 'version' not in...