Cao E comments

Results 57 comments of


                                            Cao E

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

There are still some optimizations to be done. For eaxmple, reduce the mixing of avx512 and avx2 instructions. Sorry for the inconvenience.

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

hi @jianyuh I found that the parameter data types of `transpose_avx512` and `transpose_simd` are not aligned. ``` template FBGEMM_API void transpose_simd( unsigned M, unsigned N, const T* src, unsigned ld_src,...

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

hi @jiyuanzFB could you please share the shapes and test environment ? what is the difference between the CI and internal test environment? I can not reproduce the issue in...

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

Make the types of M, N, ld_src, ld_dst consistent.

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

hi @jianyuh @jiyuanzFB Could you try if the internal UT error will be fixed. Thank you very much.

Fix reshape and empty input convolution issues for channels last memory format

Move fixing empty input issue of convolution for channels last memory format to https://github.com/pytorch/pytorch/pull/86521

Add shape functions for conv_transpose2d.input and convolution

@gpetters94 May I know which tests can cover this change, and how do we use `conv_forwards` and `conv_transpose2d_input` defined in `serialized_shape_function_registry.cpp` or `_shape_functions.py`. Seems the code below can't utilize `conv_forwards`...

Cao E

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

Fix reshape and empty input convolution issues for channels last memory format

Add shape functions for conv_transpose2d.input and convolution

Set simdlen based on ATEN_CPU_CAPABILITY

Recent PyTorch nightly builds breaks DALLE2_pytorch model in Torchbench

add cl3d support for conv deconv and add arg is_channels_last