AnnaTest
AnnaTest
msvsr_reds
### Bug description during training, I find that the next() in dataloader will spend 10~20s, I already set higher num_worker form 8 to 32, It's still spend long time in...
### PR types Others ### PR changes Others ### Description Others
## Description when I convert onnx to tensorrt it alway error like: ` Error[1]: [defaultAllocator.cpp::deallocateAsync::64] Error Code 1: Cuda Runtime (operation not supported)XXX failure of TensorRT X.Y when running XXX...
I find that the Conv+bn can't fused with relu and Conv+bn Kernel's output type always is FP32, very slow, slower than FP16 and int8 ``` import torch import torchvision import...
### PR types Others ### PR changes Others ### Description fix bug when use_flas_attention is 0