Tobias Ribizel issues

Results 105 issues of


                                            Tobias Ribizel

Improve Symbolic Cholesky performance

This improves the symbolic Cholesky performance by preprocessing the matrix on the GPU with a Minimum Spanning Tree algorithm. Example rgg_22 from SuiteSparse with METIS nested dissection on H100: *...

reg:testing

reg:benchmarking

1:ST:ready-for-review

type:factorization

reg:helper-scripts

mod:all

Add primitive for warp load balancing

This adds a primitive that allows the distribution of variable-sized chunks of work across a warp for better memory coalescing and warp utilization. This can be used as a component...

reg:build

reg:testing

mod:core

mod:cuda

mod:hip

Add bucketsort kernels

Extracted from the symbolic Cholesky (but might also be useful for other things, e.g. parallel OpenMP COO sorting)

reg:build

reg:testing

mod:cuda

mod:openmp

1:ST:ready-for-review

mod:hip

Add logger for benchmark work estimate output

This adds another optional column to the `ProfilerHook::create_(nested_)summary` logger that computes memory bandwidths/FLOPS/custom rates for kernels with work estimates. Merge stack: - [x] #1782 - [ ] #802

reg:build

reg:testing

mod:core

type:matrix-format

1:ST:ready-for-review

type:factorization

reg:helper-scripts

Add benchmark work estimate for simple Cg solve

As a starting point and example for adding work estimates to kernels, this adds the necessary operations to all non-trivial kernels in a simple unpreconditioned Cg solve. Example output for...

mod:core

type:solver

type:matrix-format

1:ST:ready-for-review

Improve matrix_assembly_data performance

We could make the performance of `matrix_assembly_data` much better by building a row-wise flat hash map ourselves. For that we only need an upper bound for the number of columns...

is:idea

Replace `roctracer` by `rocprofiler-sdk`

The former is being phased out, see https://github.com/ROCm/roctracer/issues/56#issuecomment-2385675072 for more details

is:todo

Add microbenchmarks for various components

- [x] memory atomics - [x] sorting - [x] bitvectors - [ ] searching - [ ] merging - [ ] sync-free operations

reg:build

reg:benchmarking

Build failures due to race condition in patch file handling

### Steps to reproduce When installing multiple versions of LLVM that use the same patch files, I am seeing test failures that look like a race condition between the creation/access...

bug

impact-high

Locks on patch files in stage directory cause hangs in concurrent builds

### Steps to reproduce Somewhat related to #50696 When building multiple LLVM versions that share the same patch files, for some reason the write locks on the patch files are...

bug

impact-high