Cao E
Cao E
There are still some optimizations to be done. For eaxmple, reduce the mixing of avx512 and avx2 instructions. Sorry for the inconvenience.
hi @jianyuh I found that the parameter data types of `transpose_avx512` and `transpose_simd` are not aligned. ``` template FBGEMM_API void transpose_simd( unsigned M, unsigned N, const T* src, unsigned ld_src,...
hi @jiyuanzFB could you please share the shapes and test environment ? what is the difference between the CI and internal test environment? I can not reproduce the issue in...
Make the types of M, N, ld_src, ld_dst consistent.
hi @jianyuh @jiyuanzFB Could you try if the internal UT error will be fixed. Thank you very much.
Move fixing empty input issue of convolution for channels last memory format to https://github.com/pytorch/pytorch/pull/86521
@gpetters94 May I know which tests can cover this change, and how do we use `conv_forwards` and `conv_transpose2d_input` defined in `serialized_shape_function_registry.cpp` or `_shape_functions.py`. Seems the code below can't utilize `conv_forwards`...
This PR will expose issues when ATEN_CPU_CAPABILITY=default. It seems that some reduce ops have a accuracy gap compared to eager.
@malfet We didn't consider the case: input and gradient memory formats are not be aligned. Can we move the `.contiguous` to the `native_group_norm_backward` function in aten/src/ATen/native/group_norm.cpp and make the memory...
It depends on https://github.com/pytorch/pytorch/pull/97430