cyfwry

Results 3 issues of cyfwry

Hello,I can not understand one sentence in your paper,'When the training error keeps unchanged in five sequential epochs, we merge the parameters of each batch normalization into the adjacent convolution...

Fixed issue #322. Now qk_scale can be used by swin FMHA. Signed-off-by: liangtao07

### Description ```shell branch: v5.0 gpu: T4 ``` ### Reproduced Steps ```shell 1. clone and compile FasterTransformer 2. cd examples/pytorch/swin 3. modify QK_SCALE in Swin-Transformer-Quantization/SwinTransformer/configs/swin_tiny_patch4_window7_224.yaml, e.g. QK_SCALE=2.0 4. sh run_test.sh...

bug