divya2108 comments

Results 10 comments of


                                            divya2108

Added SVE implementation to improve the performance on ARM architecture

Hi @trivialfis The code has been thoroughly validated to ensure alignment and padding issues are addressed. The datatypes have not been altered from the scalar code; instead, the original scalar...

Added SVE implementation to improve the performance on ARM architecture

Specialized allocators like aligned_alloc() doesn't help with SVE intrinsics because: 1. ARM's SVE SIMD architecture handles data processing in parallel, which inherently considers data alignment. For example for a 256...

Added SVE implementation to improve the performance on ARM architecture

Hi @trivialfis, Additionally, SVE also provides predicate registers enabling key features such as: a) **Per-lane predication** that allows SIMD instructions to be executed conditionally on specific lanes of a SIMD...

Added SVE implementation to improve the performance on ARM architecture

> Started looking into this PR today. Thank you for working on using the arm intrinsic, but could you please add **detailed** code comments and extract the code into an...

Added SVE implementation to improve the performance on ARM architecture

> Is SVE guaranteed to be available for ARM implementation? No, SVE is not guaranteed to be available on all ARM implementations. While ARMv8-A architecture, which includes SVE support, is...

Added SVE implementation to improve the performance on ARM architecture

> > > The CMake logic looks right. It only compiles SVE code when the compiler supports it and during the runtime it triggers the SVE code only when the...

Added SVE implementation to improve the performance on ARM architecture

> Sorry for the slow reply, got stuck at some other work lately. One question, is it possible to reduce the call frequency of `check_sve_hw_support` to maybe once per training...

Added SVE implementation to improve the performance on ARM architecture

Hi @trivialfis, just wanted to follow up on the code review. Let me know if you need any additional details or clarifications.

Added SVE implementation to improve the performance on ARM architecture

> Could you please share the CPU you were using for the benchmarks? I ran a benchmark on a Grace machine (I work for NVIDIA) with synthetic data, and the...

Added SVE implementation to improve the performance on ARM architecture

Hi @trivialfis, just wanted to follow up on the PR review. Let me know if there’s anything I can do to help or clarify. Thanks!