apex icon indicating copy to clipboard operation
apex copied to clipboard

Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

Open BenQLange opened this issue 4 years ago • 8 comments

Hi,

I receive the error shown below when I try FP16 training (opt_level="03"). When training on opt_level="01", everything seems to be working fine. I have attached a snippet of the code with relevant parts. I believe I have followed your documentation but maybe I am missing something.

Thanks for help

PyTorch: 1.4.0 Cuda: 10.1

Code snippet: `

...
from apex.fp16_utils import *
from apex import amp, optimizers 
...

model = MyModel()
optimizer = torch.optim.Adam(model.parameters(),lr=0.01, eps=10**-7)

model, optimizer = amp.initialize(model, optimizer, opt_level="O3")

for epoch in range(nb_epoch):

    optimizer = lr_scheduler(optimizer, epoch)

    for i, inputs in enumerate(train_loader):

        inputs = inputs.permute(0, 1, 4, 2, 3)
        inputs = inputs.cuda()
        errors = model(inputs)
        errors = errors.float()
        loc_batch = errors.size(0)
        errors = torch.mm(errors.view(-1, nt), time_loss_weights)
        errors = torch.mm(errors.view(loc_batch, -1), layer_loss_weights)
        errors = torch.mean(errors)
        optimizer.zero_grad()
        with amp.scale_loss(errors, optimizer) as scaled_loss:
            scaled_loss.backward()
            #errors.backward()
        optimizer.step()

' Error:


Selected optimization level O3:  Pure FP16 training.
Defaults for this optimization level are:
enabled                : True
opt_level              : O3
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : False
master_weights         : False
loss_scale             : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O3
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : False
master_weights         : False
loss_scale             : 1.0
Traceback (most recent call last):
  File "train_t_1.py", line 96, in <module>
    errors = model(inputs)
  File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bernard/miniconda3/lib/python3.7/site-packages/apex/amp/_initialize.py", line 197, in new_fwd
    **applier(kwargs, input_caster))
  File "/home/bernard/Projects/PrednetConvLSTMPytorch/PredNetOriginal.py", line 141, in forward
    Rep, Cell = cell(tmp, Cell)
  File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bernard/Projects/PrednetConvLSTMPytorch/ConvLSTMCellPredNet.py", line 41, in forward
    i_t = torch.sigmoid(self.W_i(inputs)) #Bias included in self.W_.. initialization
  File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward
    return self.conv2d_forward(input, self.weight)
  File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

`

BenQLange avatar Mar 18 '20 01:03 BenQLange

Facing the same error while using 'O2', have you found the solution?@BenQLange I found this error only appeared when I run code with multiple GPUs, it is ok while using single GPU. And while using multiple GPUs, even with 'O1', it seems that the memory usage would be much higher than using single GPU.

jackroos avatar Jun 03 '20 09:06 jackroos

In my case, the reason is that I used a non-official SyncBatchNorm implementation and it seems apex couldn't deal with it.

jackroos avatar Jun 04 '20 02:06 jackroos

when global wheat (kaggle) trained weight feed to predict, faced same error message.

sailfish009 avatar Jul 01 '20 22:07 sailfish009

I encountered totally the same error, does anyone has a good solution? Thanks!

MingLunHan avatar Nov 20 '20 08:11 MingLunHan

im also facing the same issue, I trained pytorch yolo V5 model, and then tried to integrate with flask API,

class Model(object):

    def __init__(self, model):

        self.device = torch_utils.select_device()
        print(self.device)
        model = torch.load(model, map_location=self.device)['model']

        self.half = False and self.device.type != 'cpu'
        print('half = ' + str(self.half))

        if self.half:
            model.half()

        model  = model.to(self.device).eval()

        self.loaded_model = model

    def predict(self, img):
        global session
        # img = torch.zeros((1, 3, 640, 640), device=self.device)
        img1 = torch.from_numpy(img).to(self.device)
        img = img1.reshape(1, 3, 640, 640)
        img = img.half() if self.half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        print(img.ndimension())
        if img.ndimension() == 3:
            img = img.unsqueeze(0)
        print(self.loaded_model)
        img = img.to(self.device)

        self.preds = self.loaded_model(img, augment=False)[0]
        return  self.preds

in my camera.py file I tried to read frame by frame and get prediction as below

model = FacecoverDetectModel("weights/best.pt")

class Camera(object):
    def __init__(self):
        self.video = cv2.VideoCapture(0)

    def __del__(self):
        self.video.release()

    def get_frame(self):
        _, fr = self.video.read()
        loader = transforms.Compose([transforms.ToTensor()])

        image = cv2.resize(fr, (640, 640), interpolation=cv2.INTER_AREA)
        input_im = image.reshape(1, 640, 640, 3)

        pil_im = Image.fromarray(fr)
        image = loader(pil_im).float()
        # image = Variable(image, requires_grad=True)
        image = image.unsqueeze(0)


        pred = model.predict(input_im)
        print(pred)
        _, jpeg = cv2.imencode('.jpg', fr)
        return jpeg.tobytes()

any ideas please

dulangaheshan avatar Nov 26 '20 20:11 dulangaheshan

I encountered the similar error, that is, 'RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.FloatTensor) should be the same', begging for a good solution! thanks in advance!

Cassieyy avatar Oct 29 '21 03:10 Cassieyy

Same here

danielrotaermel avatar Nov 16 '21 22:11 danielrotaermel

The torch.cuda.HalfTensor is created when there is an implicit operation in the network to modify the memory size. The easiest get-around option would be to cast the smaller tensor to larger memory size. Applying the operation another way round (i.e. decreasing the size of the tensor) won't do the trick.

Amarkr1 avatar Feb 15 '22 09:02 Amarkr1