micronet icon indicating copy to clipboard operation
micronet copied to clipboard

使用多卡训练时的bug

Open coderhss opened this issue 4 years ago • 4 comments

self.scale = torch.max(self.scale, self.eps)                                                    # processing for very small scale

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! 在训练IOA时,使用多卡训练会报错。请问有人有遇到吗?

coderhss avatar Jan 10 '21 03:01 coderhss

IAO请先用单卡

666DZY666 avatar Jan 10 '21 09:01 666DZY666

难受了,单卡存不下

coderhss avatar Jan 11 '21 06:01 coderhss

把 --eval_batch_size 调小

666DZY666 avatar Jan 11 '21 11:01 666DZY666

如果是iao,我在output = (torch.clamp(self.round(input / self.scale - self.zero_point), self.quant_min_val, self.quant_max_val) + self.zero_point) * self.scale的前面添加了 if self.scale.device != input.device: self.scale = self.scale.to(input.device) if self.zero_point.device != input.device: self.zero_point = self.zero_point.to(input.device) if self.quant_min_val.device != input.device: self.quant_min_val = self.quant_min_val.to(input.device) if self.quant_max_val.device != input.device: self.quant_max_val = self.quant_max_val.to(input.device) 可以用多卡了,反正就是直接用input.device,不要用observer.device

xiaoguoer avatar Mar 15 '21 08:03 xiaoguoer