ginkgo Current NOT_IMPLEMENTED kernels

We should discuss what kernels we should implement before the release. Since I lost the overview, I looked in the code and searched for kernels which are marked as GKO_NOT_IMPLEMENTED.

Interestingly, we do not support Coo::transpose and Coo::conj_transpose anywhere, which should be fairly trivial to implement (swapping col_idxs_ and row_idxs_ with std::move, sorting, followed by complex conjugation for conj_transpose).

The following are lists of GKO_NOT_IMPLEMENTED kernels (which are actually not implemented and not just have a case which is not implemented):

CUDA

[ ] index_set
[x] csr
- [x] convert_to_fbcsr
[x] dense
- [x] convert_to_fbcsr
- [x] count_nonzero_blocks_per_row
- [x] convert_to_sparsity_csr #904
[ ] fbcsr
- [x] fill_in_dense
- [ ] sort_by_column_index
- [ ] extract_diagonal
[ ] sparsity_csr
[ ] jacobi
- [ ] convert_to_dense

DPC++

[ ] index_set
[ ] partition #1034
[x] factorization #928
[ ] ic
[ ] ilu
[x] par_ic #928
[x] par_ilu #928
[x] par_ict #928
[x] par_ilut #928
[ ] csr
- [ ] convert_to_fbcsr
[ ] dense
- [ ] convert_to_fbcsr
- [ ] count_nonzero_blocks_per_row
- [x] convert_to_hybrid #904
- [x] convert_to_sparsity_csr #904
[ ] fbcsr
[ ] fft
[x] hybrid #904
[ ] sparsity_csr
[x] amgx_pgm #933
[x] isai #924
[ ] jacobi #929
[ ] lower_trs
[ ] upper_trs
[x] multigrid

HIP

[ ] index_set
[x] csr
- [x] convert_to_fbcsr
[x] dense
- [x] convert_to_fbcsr
- [x] count_nonzero_blocks_per_row
- [x] convert_to_sparsity_csr #904
[x] fbcsr
- [x] spmv
- [x] advanced_spmv
- [x] fill_in_dense
- [ ] transpose
- [ ] conj_transpose
- [ ] sort_by_column_index
- [x] is_sorted_by_column_index
- [ ] extract_diagonal
[ ] sparsity_csr
[ ] jacobi
- [ ] convert_to_dense

OpenMP

[ ] ic
[ ] ilu
[x] fbcsr
- [x] convert_to_csr

Current conversions supported (maybe not fully implemented)

Everything -> Csr Everything -> Dense Csr -> Everything Dense -> Everything

Effective: 14.10.2019

Updated: 01.02.2022

Apr 02 '19 17:04 thoasm

I will definitively take care of the move_to_XXX replacements, but for the others, we should discuss what is necessary. I can also take care of the COO::transpose, it should also be pretty straight forward.

Apr 02 '19 17:04 thoasm

I think the reference transpose and conjugate transpose should be easy. No need for the other executors. I don't have a strong opinion on the others - I think they are all optional.

Apr 03 '19 09:04 hartwiganzt

transpose in general is easy, I don't even have to implement any kernels (since I just swap the col and row index arrays). Additionally, I think we should also implement at least the OMP kernels, so we actually support everything with OMP that we do with CUDA.

Apr 03 '19 09:04 thoasm

I tried to check everything that is "taken care of". Overall, what is left I think is:

CUDA Dense <-> Hybrid conversions (hard to implement AFAIK)
OpenMP CSR <-> Ell, Sellp
Everywhere CSR <-> Hybrid

I believe none are that much required for the release.

Apr 05 '19 13:04 tcojean

That is true, I removed the dependency to the Release. We also said previously that we do not require the OpenMP kernels since we have the reference version.

Apr 05 '19 13:04 thoasm

Note: this was updated to the last status.

Oct 14 '19 09:10 tcojean

ginkgo ginkgo copied to clipboard

Current NOT_IMPLEMENTED kernels

CUDA

DPC++

HIP

OpenMP

Current conversions supported (maybe not fully implemented)

ginkgo
ginkgo copied to clipboard