ginkgo icon indicating copy to clipboard operation
ginkgo copied to clipboard

Current NOT_IMPLEMENTED kernels

Open thoasm opened this issue 5 years ago • 6 comments

We should discuss what kernels we should implement before the release. Since I lost the overview, I looked in the code and searched for kernels which are marked as GKO_NOT_IMPLEMENTED.

Interestingly, we do not support Coo::transpose and Coo::conj_transpose anywhere, which should be fairly trivial to implement (swapping col_idxs_ and row_idxs_ with std::move, sorting, followed by complex conjugation for conj_transpose).

The following are lists of GKO_NOT_IMPLEMENTED kernels (which are actually not implemented and not just have a case which is not implemented):

CUDA

  • [ ] index_set
  • [x] csr
    • [x] convert_to_fbcsr
  • [x] dense
    • [x] convert_to_fbcsr
    • [x] count_nonzero_blocks_per_row
    • [x] convert_to_sparsity_csr #904
  • [ ] fbcsr
    • [x] fill_in_dense
    • [ ] sort_by_column_index
    • [ ] extract_diagonal
  • [ ] sparsity_csr
  • [ ] jacobi
    • [ ] convert_to_dense

DPC++

  • [ ] index_set
  • [ ] partition #1034
  • [x] factorization #928
  • [ ] ic
  • [ ] ilu
  • [x] par_ic #928
  • [x] par_ilu #928
  • [x] par_ict #928
  • [x] par_ilut #928
  • [ ] csr
    • [ ] convert_to_fbcsr
  • [ ] dense
    • [ ] convert_to_fbcsr
    • [ ] count_nonzero_blocks_per_row
    • [x] convert_to_hybrid #904
    • [x] convert_to_sparsity_csr #904
  • [ ] fbcsr
  • [ ] fft
  • [x] hybrid #904
  • [ ] sparsity_csr
  • [x] amgx_pgm #933
  • [x] isai #924
  • [ ] jacobi #929
  • [ ] lower_trs
  • [ ] upper_trs
  • [x] multigrid

HIP

  • [ ] index_set
  • [x] csr
    • [x] convert_to_fbcsr
  • [x] dense
    • [x] convert_to_fbcsr
    • [x] count_nonzero_blocks_per_row
    • [x] convert_to_sparsity_csr #904
  • [x] fbcsr
    • [x] spmv
    • [x] advanced_spmv
    • [x] fill_in_dense
    • [ ] transpose
    • [ ] conj_transpose
    • [ ] sort_by_column_index
    • [x] is_sorted_by_column_index
    • [ ] extract_diagonal
  • [ ] sparsity_csr
  • [ ] jacobi
    • [ ] convert_to_dense

OpenMP

  • [ ] ic
  • [ ] ilu
  • [x] fbcsr
    • [x] convert_to_csr

Current conversions supported (maybe not fully implemented)

Everything -> Csr Everything -> Dense Csr -> Everything Dense -> Everything

Effective: 14.10.2019

Updated: 01.02.2022

thoasm avatar Apr 02 '19 17:04 thoasm

I will definitively take care of the move_to_XXX replacements, but for the others, we should discuss what is necessary. I can also take care of the COO::transpose, it should also be pretty straight forward.

thoasm avatar Apr 02 '19 17:04 thoasm

I think the reference transpose and conjugate transpose should be easy. No need for the other executors. I don't have a strong opinion on the others - I think they are all optional.

hartwiganzt avatar Apr 03 '19 09:04 hartwiganzt

transpose in general is easy, I don't even have to implement any kernels (since I just swap the col and row index arrays). Additionally, I think we should also implement at least the OMP kernels, so we actually support everything with OMP that we do with CUDA.

thoasm avatar Apr 03 '19 09:04 thoasm

I tried to check everything that is "taken care of". Overall, what is left I think is:

  • CUDA Dense <-> Hybrid conversions (hard to implement AFAIK)
  • OpenMP CSR <-> Ell, Sellp
  • Everywhere CSR <-> Hybrid

I believe none are that much required for the release.

tcojean avatar Apr 05 '19 13:04 tcojean

That is true, I removed the dependency to the Release. We also said previously that we do not require the OpenMP kernels since we have the reference version.

thoasm avatar Apr 05 '19 13:04 thoasm

Note: this was updated to the last status.

tcojean avatar Oct 14 '19 09:10 tcojean