kokkos-tools icon indicating copy to clipboard operation
kokkos-tools copied to clipboard

Kokkos::Tools::Experimental::device_id() seems to return ids only associated with cuda device 0 on multi-gpu nodes

Open rppawlo opened this issue 1 year ago • 5 comments

I have a kokkos tool that checks the stream and looks to see if a fence call is using the default stream. If I call the Kokkos function below to check what stream I am on:

auto phalanx_default_stream_device_id = Kokkos::Tools::Experimental::device_id(Kokkos::Cuda());

and then run a test that calls:

Kokkos::Cuda().fence()

where I have the registered the below callback in the kokkos-tools, I get a consistent stream id and an exception thrown as expected:

  void phalanx_kt_fence_callback(char const *label, uint32_t device_id,
                                 uint64_t * /*fence_id*/)
  {
    TEUCHOS_TEST_FOR_EXCEPTION(device_id == phalanx_default_stream_device_id,
                               std::runtime_error,
                               "\"ERROR: the fence \"" << label
                               << "\" with device id=" << device_id
                               << " is the same as the default stream id="
                               << phalanx_default_stream_device_id);
  }

However, if I run the executable with --kokkos-device-id=3 to pick a different GPU on a node then the function does not return a consistent id. It looks like the device_id() function always returns the stream id for cuda device id 0. Is this intended? How do I get the default stream id for the device this particular mpi process has chosen?

rppawlo avatar Dec 12 '23 00:12 rppawlo