Tobias Ribizel

Results 105 issues of Tobias Ribizel

This improves the symbolic Cholesky performance by preprocessing the matrix on the GPU with a Minimum Spanning Tree algorithm. Example rgg_22 from SuiteSparse with METIS nested dissection on H100: *...

reg:testing
reg:benchmarking
1:ST:ready-for-review
type:factorization
reg:helper-scripts
mod:all

This adds a primitive that allows the distribution of variable-sized chunks of work across a warp for better memory coalescing and warp utilization. This can be used as a component...

reg:build
reg:testing
mod:core
mod:cuda
mod:hip

Extracted from the symbolic Cholesky (but might also be useful for other things, e.g. parallel OpenMP COO sorting)

reg:build
reg:testing
mod:cuda
mod:openmp
1:ST:ready-for-review
mod:hip

This adds another optional column to the `ProfilerHook::create_(nested_)summary` logger that computes memory bandwidths/FLOPS/custom rates for kernels with work estimates. Merge stack: - [x] #1782 - [ ] #802

reg:build
reg:testing
mod:core
type:matrix-format
1:ST:ready-for-review
type:factorization
reg:helper-scripts

As a starting point and example for adding work estimates to kernels, this adds the necessary operations to all non-trivial kernels in a simple unpreconditioned Cg solve. Example output for...

mod:core
type:solver
type:matrix-format
1:ST:ready-for-review

We could make the performance of `matrix_assembly_data` much better by building a row-wise flat hash map ourselves. For that we only need an upper bound for the number of columns...

is:idea

The former is being phased out, see https://github.com/ROCm/roctracer/issues/56#issuecomment-2385675072 for more details

is:todo

- [x] memory atomics - [x] sorting - [x] bitvectors - [ ] searching - [ ] merging - [ ] sync-free operations

reg:build
reg:benchmarking

### Steps to reproduce When installing multiple versions of LLVM that use the same patch files, I am seeing test failures that look like a race condition between the creation/access...

bug
impact-high

### Steps to reproduce Somewhat related to #50696 When building multiple LLVM versions that share the same patch files, for some reason the write locks on the patch files are...

bug
impact-high