Results 21 comments of tzcnt

This machine definitely has lock-free 128bit atomics. However HPX did not detect them automatically. I am using Clang and it has known issues with this https://github.com/llvm/llvm-project/issues/75081. I assumed that if...

After further profiling, I believe the issue is `madvise(MADV_DONTNEED)` causing IPI and tlb flush on the other processors. The issue seems to be similar to what's discussed at https://github.com/bytecodealliance/wasmtime issue...

I got between 3-10x speedup by setting `HPX_WITH_THREAD_STACK_MMAP=OFF`. This removes all the syscall time from the perf trace. I can now I will start the process of benchmarking the comparative...

After setting `HPX_WITH_THREAD_STACK_MMAP=OFF` and experimenting with every combination of build-time parameters and schedulers, I've uploaded new benchmark implementations which improves HPX performance to some degree. The various different schedulers didn't...

Due to differences in the library APIs they don't all measure exactly the same behavior. I tried to get the implementations as close as possible, but the API differences make...

That makes a marked improvement in the performance. I'm doing a full benchmark run and will report back with the results shortly. A couple notes on the implementation: - I...

New benchmark results based on the development version are temporarily available at: https://fleetcode.com/runtime-benchmarks/release_test/. You can compare them to the original at: https://fleetcode.com/runtime-benchmarks generated from this commit: https://github.com/tzcnt/runtime-benchmarks/commit/658a8ca7cb697c465b955c3a4e5b1dd047158631

The 7742 has multiple independent L3 caches with 4 cores sharing each cache, so it could be considered a "NUCA" architecture. See https://www.anandtech.com/show/16529/amd-epyc-milan-review/4 for reference. My machine has the option...

The number of threads in a single `ex_cpu` will be clamped to 64. This happens if you call `set_thread_occupancy()` or use the automatic thread configuration (with, or without, hwloc). If...

IIUC task_container is like `tbb::task_group`? I found this type of construction to be very useful actually, sometimes it's nice to construct this group imperatively rather than passing all subtasks at...