AnnaTest

Results 10 issues of AnnaTest

msvsr_reds

### Bug description during training, I find that the next() in dataloader will spend 10~20s, I already set higher num_worker form 8 to 32, It's still spend long time in...

question
data handling
ver: 2.1.x

### PR types Others ### PR changes Others ### Description Others

## Description when I convert onnx to tensorrt it alway error like: ` Error[1]: [defaultAllocator.cpp::deallocateAsync::64] Error Code 1: Cuda Runtime (operation not supported)XXX failure of TensorRT X.Y when running XXX...

### PR types Others ### PR changes Others ### Description Others

stale

I find that the Conv+bn can't fused with relu and Conv+bn Kernel's output type always is FP32, very slow, slower than FP16 and int8 ``` import torch import torchvision import...

Module:Performance
triaged

### PR types Others ### PR changes Others ### Description fix bug when use_flas_attention is 0