Tobias Ribizel
Tobias Ribizel
I'd like to investigate implementing a reduction for associative, but non-commutative operations. Related to https://github.com/NVIDIA/thrust/issues/1434 The this kind of algorithm comes in handy when establishing the global context in parsing...
CMake has (or has gained) many features that we have our custom workarounds for, which I think we should remove soon * CUDA device architectures can be auto-detected since CMake...
A lot of the files we have in `common/cuda_hip` don't work standalone, but instead require to be included with certain other includes being available. Here I'm trying to change this,...
Not sure how specific we should be here, this is probably overkill
There are different sizes inside the `index_set` that have unclear names, I want to propose `get_subset_size()` or `get_local_size()` and `get_superset_size()` or `get_global_size()` as well as `get_num_ranges()` instead of `get_num_subsets()`. See...
I think the current factory parameter setup could use with a few improvements: - [x] The factory parameters generated by our macros are `mutable` by default! That's a big code...
Still need to check performance on this, but we can really use the atomic operations based on OpenMP primitives.
Allow DpcppExecutor to be constructed from a sycl::device, which enables sub-device usage required for #1373. I also added a handful of fixes for deprecation warnings
As a tool for implementing reusable factories, this adds reusable functionality for all Csr permutation and transpose functions. It also takes a first step towards making `Permutation` the default representation...
Currently ReferenceExecutor derives from OmpExecutor to inherit the allocation and copy functionality. That means that some OpenMP functionality needs to be compiled even with OpenMP disabled. Maybe we want to...