apex
apex copied to clipboard
Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
Hi,
I receive the error shown below when I try FP16 training (opt_level="03"). When training on opt_level="01", everything seems to be working fine. I have attached a snippet of the code with relevant parts. I believe I have followed your documentation but maybe I am missing something.
Thanks for help
PyTorch: 1.4.0 Cuda: 10.1
Code snippet: `
...
from apex.fp16_utils import *
from apex import amp, optimizers
...
model = MyModel()
optimizer = torch.optim.Adam(model.parameters(),lr=0.01, eps=10**-7)
model, optimizer = amp.initialize(model, optimizer, opt_level="O3")
for epoch in range(nb_epoch):
optimizer = lr_scheduler(optimizer, epoch)
for i, inputs in enumerate(train_loader):
inputs = inputs.permute(0, 1, 4, 2, 3)
inputs = inputs.cuda()
errors = model(inputs)
errors = errors.float()
loc_batch = errors.size(0)
errors = torch.mm(errors.view(-1, nt), time_loss_weights)
errors = torch.mm(errors.view(loc_batch, -1), layer_loss_weights)
errors = torch.mean(errors)
optimizer.zero_grad()
with amp.scale_loss(errors, optimizer) as scaled_loss:
scaled_loss.backward()
#errors.backward()
optimizer.step()
' Error:
Selected optimization level O3: Pure FP16 training.
Defaults for this optimization level are:
enabled : True
opt_level : O3
cast_model_type : torch.float16
patch_torch_functions : False
keep_batchnorm_fp32 : False
master_weights : False
loss_scale : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O3
cast_model_type : torch.float16
patch_torch_functions : False
keep_batchnorm_fp32 : False
master_weights : False
loss_scale : 1.0
Traceback (most recent call last):
File "train_t_1.py", line 96, in <module>
errors = model(inputs)
File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/bernard/miniconda3/lib/python3.7/site-packages/apex/amp/_initialize.py", line 197, in new_fwd
**applier(kwargs, input_caster))
File "/home/bernard/Projects/PrednetConvLSTMPytorch/PredNetOriginal.py", line 141, in forward
Rep, Cell = cell(tmp, Cell)
File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/bernard/Projects/PrednetConvLSTMPytorch/ConvLSTMCellPredNet.py", line 41, in forward
i_t = torch.sigmoid(self.W_i(inputs)) #Bias included in self.W_.. initialization
File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
`
Facing the same error while using 'O2', have you found the solution?@BenQLange I found this error only appeared when I run code with multiple GPUs, it is ok while using single GPU. And while using multiple GPUs, even with 'O1', it seems that the memory usage would be much higher than using single GPU.
In my case, the reason is that I used a non-official SyncBatchNorm implementation and it seems apex couldn't deal with it.
when global wheat (kaggle) trained weight feed to predict, faced same error message.
I encountered totally the same error, does anyone has a good solution? Thanks!
im also facing the same issue, I trained pytorch yolo V5 model, and then tried to integrate with flask API,
class Model(object):
def __init__(self, model):
self.device = torch_utils.select_device()
print(self.device)
model = torch.load(model, map_location=self.device)['model']
self.half = False and self.device.type != 'cpu'
print('half = ' + str(self.half))
if self.half:
model.half()
model = model.to(self.device).eval()
self.loaded_model = model
def predict(self, img):
global session
# img = torch.zeros((1, 3, 640, 640), device=self.device)
img1 = torch.from_numpy(img).to(self.device)
img = img1.reshape(1, 3, 640, 640)
img = img.half() if self.half else img.float() # uint8 to fp16/32
img /= 255.0 # 0 - 255 to 0.0 - 1.0
print(img.ndimension())
if img.ndimension() == 3:
img = img.unsqueeze(0)
print(self.loaded_model)
img = img.to(self.device)
self.preds = self.loaded_model(img, augment=False)[0]
return self.preds
in my camera.py file I tried to read frame by frame and get prediction as below
model = FacecoverDetectModel("weights/best.pt")
class Camera(object):
def __init__(self):
self.video = cv2.VideoCapture(0)
def __del__(self):
self.video.release()
def get_frame(self):
_, fr = self.video.read()
loader = transforms.Compose([transforms.ToTensor()])
image = cv2.resize(fr, (640, 640), interpolation=cv2.INTER_AREA)
input_im = image.reshape(1, 640, 640, 3)
pil_im = Image.fromarray(fr)
image = loader(pil_im).float()
# image = Variable(image, requires_grad=True)
image = image.unsqueeze(0)
pred = model.predict(input_im)
print(pred)
_, jpeg = cv2.imencode('.jpg', fr)
return jpeg.tobytes()
any ideas please
I encountered the similar error, that is, 'RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.FloatTensor) should be the same', begging for a good solution! thanks in advance!
Same here
The torch.cuda.HalfTensor
is created when there is an implicit operation in the network to modify the memory size. The easiest get-around option would be to cast the smaller tensor to larger memory size. Applying the operation another way round (i.e. decreasing the size of the tensor) won't do the trick.