divya2108

Results 10 comments of divya2108

Hi @trivialfis The code has been thoroughly validated to ensure alignment and padding issues are addressed. The datatypes have not been altered from the scalar code; instead, the original scalar...

Specialized allocators like aligned_alloc() doesn't help with SVE intrinsics because: 1. ARM's SVE SIMD architecture handles data processing in parallel, which inherently considers data alignment. For example for a 256...

Hi @trivialfis, Additionally, SVE also provides predicate registers enabling key features such as: a) **Per-lane predication** that allows SIMD instructions to be executed conditionally on specific lanes of a SIMD...

> Started looking into this PR today. Thank you for working on using the arm intrinsic, but could you please add **detailed** code comments and extract the code into an...

> Is SVE guaranteed to be available for ARM implementation? No, SVE is not guaranteed to be available on all ARM implementations. While ARMv8-A architecture, which includes SVE support, is...

> > > The CMake logic looks right. It only compiles SVE code when the compiler supports it and during the runtime it triggers the SVE code only when the...

> Sorry for the slow reply, got stuck at some other work lately. One question, is it possible to reduce the call frequency of `check_sve_hw_support` to maybe once per training...

Hi @trivialfis, just wanted to follow up on the code review. Let me know if you need any additional details or clarifications.

> Could you please share the CPU you were using for the benchmarks? I ran a benchmark on a Grace machine (I work for NVIDIA) with synthetic data, and the...

Hi @trivialfis, just wanted to follow up on the PR review. Let me know if there’s anything I can do to help or clarify. Thanks!