Rohanjames1997
Rohanjames1997
All these changes are required to add a UT for the NEON implementation of `vec_reduce_all` that I introduced in #105590. This enables reuse of the existing tests in aten/src/ATen/test/vec_test_all_types.cpp As...
Fixes #104729 As suggested in the [blog](https://dev-discuss.pytorch.org/t/torchinductor-update-5-cpu-backend-backend-performance-update-and-deep-dive-on-key-optimizations/1117#:~:text=It%20can%20be,sub%2Dclasses.), I subclassed the `VecISA` class and implemented a NEON version of the `vec_reduce_all()` function, to go along with the existing AVX2 and AVX512...
Tested along with https://github.com/openxla/xla/pull/16527
### Proposed new feature or change: Similar to how https://github.com/numpy/numpy/pull/21955 vectorized umath functions using AVX512 FP16, I'm interested in leveraging NEON/SVE to get similar benefits for aarch64 processors. I'd be...
This PR enables CI on Github-hosted arm64 runners that are now [available for free](https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/) in public repositories Related to #11275