divya2108
divya2108
Hi @trivialfis The code has been thoroughly validated to ensure alignment and padding issues are addressed. The datatypes have not been altered from the scalar code; instead, the original scalar...
Specialized allocators like aligned_alloc() doesn't help with SVE intrinsics because: 1. ARM's SVE SIMD architecture handles data processing in parallel, which inherently considers data alignment. For example for a 256...
Hi @trivialfis, Additionally, SVE also provides predicate registers enabling key features such as: a) **Per-lane predication** that allows SIMD instructions to be executed conditionally on specific lanes of a SIMD...
> Started looking into this PR today. Thank you for working on using the arm intrinsic, but could you please add **detailed** code comments and extract the code into an...
> Is SVE guaranteed to be available for ARM implementation? No, SVE is not guaranteed to be available on all ARM implementations. While ARMv8-A architecture, which includes SVE support, is...
> > > The CMake logic looks right. It only compiles SVE code when the compiler supports it and during the runtime it triggers the SVE code only when the...
> Sorry for the slow reply, got stuck at some other work lately. One question, is it possible to reduce the call frequency of `check_sve_hw_support` to maybe once per training...
Hi @trivialfis, just wanted to follow up on the code review. Let me know if you need any additional details or clarifications.
> Could you please share the CPU you were using for the benchmarks? I ran a benchmark on a Grace machine (I work for NVIDIA) with synthetic data, and the...
Hi @trivialfis, just wanted to follow up on the PR review. Let me know if there’s anything I can do to help or clarify. Thanks!