Carl Pearson

Results 135 comments of Carl Pearson

Despite this comment `// Need to fence before reading from solution_host` it seems `create_mirror_view_and_copy` the way it is called here fences host and device before returning

`sparse_bsr_gauss_seidel_rank1_double` seems to be fixed by substituting this ```c++ Kokkos::parallel_for("KokkosGraph::FillLowerTriangle", nv, FillLowerTriangle(nv, xadj, adj, lower_count, half_src, half_dst)); ``` where we currently do: https://github.com/kokkos/kokkos-kernels/blob/ae12a2d4f285d1db71721f93e6bf3564f9cede48/graph/src/KokkosGraph_Distance1ColorHandle.hpp#L435-L439 i.e., using the range version rather than...

Replacing the `ThreadVectorRange` in `FillLowerTriangleTeam` with this also appears to work correctly: ```c++ for (size_type adjind = xadj_begin; adjind < xadj_end; ++adjind) { nnz_lno_t n = adj[adjind]; if (ii <...

I modified this function https://github.com/kokkos/kokkos-kernels/blob/8912c6d323ca008120afc3741264bccd772262fa/graph/src/KokkosGraph_Distance1ColorHandle.hpp#L323-L344 So that it prints `ii` `adjind`, and `position` inside the `ThreadVectorRange` after line 339: If I repeatedly run the tests, usually it works, but occasionally...

replacing the `Kokkos::atomic_fetch_add` with `atomicAdd_system` also seems to resolve the issue.

`Kokkos::atomic_fetch_add` emits ```ptx mov.u32 %r46, 1; atom.add.global.relaxed.gpu.s32 %r45,[%rd28],%r46; ``` `atomicAdd_system` emits ```ptx atom.global.sys.add.u32 %r20, [%rd28], 1; ``` `atomicAdd` (which also doesn't work) emits: ```ptx atom.global.add.u32 %r20, [%rd28], 1; ```

Can you please provide the compiler used?

@brian-kelley is correct, for `uk-2005` the correct number of non-zeros is `8972400198` which is an overflow. This thing determines nnz, but it uses the row map value type, which is...

This seems like it works with the following CMake options ```bash -DKokkosKernels_INST_OFFSET_SIZE_T=ON -DKokkosKernels_INST_OFFSET_INT=OFF ```