Carl Pearson comments

Results 135 comments of


                                            Carl Pearson

MPI_Pack with device memory

I dropped it because I interpreted it to mean that it just enabled some assertions and tests, but now I see that the little benchmarks are referred to as "tests"...

add multi-threaded explicit transfer benchmarks

Or perhaps just for pageable allocations

Serial backend: performance regression

## Exeuctive summary 1. `std::lock_guard` does seem to cause a slowdown, though not always 2. Bad AVX-512 codegen in GCC 10 and 11 makes it way worse. ## Long Version...

Serial backend: performance regression

I tried this, which basically performed the same as `std::lock_guard`/`std::mutex` ```c++ class GCCSpinLock { int lock_var; public: GCCSpinLock() : lock_var(0) {} void lock() { while (__sync_lock_test_and_set(&lock_var, 1)) { // Spin...

Serial backend: performance regression

What you suggested performs about the same as the `GCCSpinLock` I posted above (and `std::lock_guard`/`std::mutex`)

Serial backend: performance regression

Some usage of perf has not added much insight (GCC 10.2.0) Sure enough, Kokkos 4.4 takes 4G more cycles to complete the same number of instructions, but none of the...

Serial backend: performance regression

Valgrind isn't working for me, (latest release, Valgrind 3.23.0, GCC 14.2.0), for Kokkos 4.3 or 4.4, compiled with `-march=native -mtune=native`. ``` 4.4-patched ==928== Callgrind, a call-graph generating cache profiler ==928==...

Serial backend: performance regression

*edit*: this is because valgrind only supports up throught AVX2 If I do a release build of Kokkos 4.3 without the `native` flags, valgrind runs, but reports an invalid read...

Serial backend: performance regression

I think there's a bug in the reproducer: Note `i` goes from [`0`...`_iend`) ```c++ for (int i = 0; i < _iend; ++i) { // ... _output(cl, bf, pt, i)...

Serial backend: performance regression

> My best bet would be that we are missing out on some compiler optimizations due to the lock and that it's not the lock itself that makes the difference....