Daniel Arndt

Results 791 comments of Daniel Arndt

I don't see an impact on my Mac M1 (unless the "likely" version is used which gives me a 1.9x slowdown). This might fix/improve the issue reported in https://github.com/kokkos/kokkos/issues/5581.

On `Sapphire Rappids` with a recent `oneAPI` compiler, I see a 1.6x speed-up.

Looks like the MST algorithm (`ArborX::MST::find_component_nearest_neighbors`) already gets a good workgroup size (we might get a couple of percent better performance by choosing 128 or 256 instead of 32).

There are known issues with building in `Debug` mode for SYCL+CUDA. Building in `Release` mode should work much better.

Also, feel free to reach out to me for `SYCL` issues.

See https://github.com/intel/llvm/issues/5980.

Fixed by https://github.com/dealii/dealii/pull/14811.

> Well someone could make that function. I can give it a jab in a few weeks. You mean something else than `Utilities::fixed_power` or `Utilities::pow`?

Yes, that's difficult to work around if we still want the runtime to find a good workgroup size. I'm not sure if this is causing any real issues yet, though.

Was there a good reason for this restriction anyway?