DCNv2 icon indicating copy to clipboard operation
DCNv2 copied to clipboard

Apex training not supported !

Open ming71 opened this issue 5 years ago • 3 comments

When I try to apply apex for mix-precision training , error appears as follow"

expected scalar type Float but found Half (data<float> at /home/bit530/anaconda3/envs/torch1.1/lib/python3.7/site-packages/torch/include/ATen/core/TensorMethods.h:1821)

thus I convert all variable to type torch.float16 , but it doesn't work. It seems that your variable define in cuda source code restricts input to be float32 type, right?

int matrices_size = batch * sizeof(float *);
    auto input_b = static_cast<const float **>(THCudaMalloc(state, matrices_size));
    auto output_b = static_cast<float **>(THCudaMalloc(state, matrices_size));
    auto columns_b = static_cast<float **>(THCudaMalloc(state, matrices_size));
    auto ones_b = static_cast<const float **>(THCudaMalloc(state, matrices_size));
    auto weight_b = static_cast<const float **>(THCudaMalloc(state, matrices_size));
    auto bias_b = static_cast<const float **>(THCudaMalloc(state, matrices_size));

ming71 avatar Dec 18 '19 03:12 ming71

Any progress?

jasonkena avatar Feb 21 '20 04:02 jasonkena

Same problem, how about solve it?

zcode86 avatar Jun 23 '20 09:06 zcode86

I solved it by casting the inputs as 32-bit tensors first, then casting the results back to 16-bit. You can see my implementation here: https://github.com/jasonkena/yolact/tree/amp/external/DCNv2

jasonkena avatar Jun 23 '20 12:06 jasonkena