ginkgo
ginkgo copied to clipboard
Current NOT_IMPLEMENTED kernels
We should discuss what kernels we should implement before the release. Since I lost the overview, I looked in the code and searched for kernels which are marked as GKO_NOT_IMPLEMENTED
.
Interestingly, we do not support Coo::transpose
and Coo::conj_transpose
anywhere, which should be fairly trivial to implement (swapping col_idxs_
and row_idxs_
with std::move
, sorting, followed by complex conjugation for conj_transpose
).
The following are lists of GKO_NOT_IMPLEMENTED
kernels (which are actually not implemented and not just have a case which is not implemented):
CUDA
- [ ] index_set
- [x] csr
- [x] convert_to_fbcsr
- [x] dense
- [x] convert_to_fbcsr
- [x] count_nonzero_blocks_per_row
- [x] convert_to_sparsity_csr #904
- [ ] fbcsr
- [x] fill_in_dense
- [ ] sort_by_column_index
- [ ] extract_diagonal
- [ ] sparsity_csr
- [ ] jacobi
- [ ] convert_to_dense
DPC++
- [ ] index_set
- [ ] partition #1034
- [x] factorization #928
- [ ] ic
- [ ] ilu
- [x] par_ic #928
- [x] par_ilu #928
- [x] par_ict #928
- [x] par_ilut #928
- [ ] csr
- [ ] convert_to_fbcsr
- [ ] dense
- [ ] convert_to_fbcsr
- [ ] count_nonzero_blocks_per_row
- [x] convert_to_hybrid #904
- [x] convert_to_sparsity_csr #904
- [ ] fbcsr
- [ ] fft
- [x] hybrid #904
- [ ] sparsity_csr
- [x] amgx_pgm #933
- [x] isai #924
- [ ] jacobi #929
- [ ] lower_trs
- [ ] upper_trs
- [x] multigrid
HIP
- [ ] index_set
- [x] csr
- [x] convert_to_fbcsr
- [x] dense
- [x] convert_to_fbcsr
- [x] count_nonzero_blocks_per_row
- [x] convert_to_sparsity_csr #904
- [x] fbcsr
- [x] spmv
- [x] advanced_spmv
- [x] fill_in_dense
- [ ] transpose
- [ ] conj_transpose
- [ ] sort_by_column_index
- [x] is_sorted_by_column_index
- [ ] extract_diagonal
- [ ] sparsity_csr
- [ ] jacobi
- [ ] convert_to_dense
OpenMP
- [ ] ic
- [ ] ilu
- [x] fbcsr
- [x] convert_to_csr
Current conversions supported (maybe not fully implemented)
Everything -> Csr Everything -> Dense Csr -> Everything Dense -> Everything
Effective: 14.10.2019
Updated: 01.02.2022
I will definitively take care of the move_to_XXX
replacements, but for the others, we should discuss what is necessary.
I can also take care of the COO::transpose
, it should also be pretty straight forward.
I think the reference transpose and conjugate transpose should be easy. No need for the other executors. I don't have a strong opinion on the others - I think they are all optional.
transpose
in general is easy, I don't even have to implement any kernels (since I just swap the col and row index arrays).
Additionally, I think we should also implement at least the OMP kernels, so we actually support everything with OMP that we do with CUDA.
I tried to check everything that is "taken care of". Overall, what is left I think is:
- CUDA Dense <-> Hybrid conversions (hard to implement AFAIK)
- OpenMP CSR <-> Ell, Sellp
- Everywhere CSR <-> Hybrid
I believe none are that much required for the release.
That is true, I removed the dependency to the Release. We also said previously that we do not require the OpenMP kernels since we have the reference version.
Note: this was updated to the last status.