kokkos ScatterView Improvements

Just some random thoughts before we take this out of experimental:

make all the int enums tag classes
- would enable make_scatter_view(view,ScatterSum,ScatterAtomic,ScatterNonDuplicated)
should this thing have extent(..)?
getting the internal view works via "subview"??
ScatterValue:
- should they be copyable at all?
- what is the internal handle type
- remove join
- consistently use the update function in operators

May 08 '20 17:05 crtrott

Also see #1390 and #1392.

May 08 '20 18:05 stanmoore1

With emphasis on this comment by @crtrott https://github.com/kokkos/kokkos/issues/1390#issuecomment-363463835

May 08 '20 18:05 stanmoore1

And #2276, which is related to @crtrott's third bullet point.

May 08 '20 18:05 stanmoore1

Via Matt: ScatterView should have a way to tell if it is allocated. Using subview().extent(0) can segfault as it doesn't construct safe with the default ctr.

May 11 '20 13:05 DavidPoliakoff

Getting rid of reset_except replace with reset_duplicates
add is_allocated to all the data classes.

May 13 '20 18:05 crtrott

One improvement, and I'm not sure how much it would help, thinking GPU here, if you have a block dimension of (1, 32, 1) you would replicate 32x across the blocks, and use atomics on these data sets. This way you don't have different grids accessing the atomics...

Jun 02 '20 22:06 bathmatt

More thoughts:

make ScatterView::subview() consistent with View::subview() //semantic mismatch

Dec 03 '21 19:12 janciesko

Hi @janciesko -- is this issue still on your RADAR?

Jun 06 '22 19:06 ajpowelsnl

Here is a bit more on ScatterView, I gonna start tackling this for real:

Let's start with two use-case scenarios: wrapping and non-wrapping. And let us assume we had decent CTAD and that we had an extents() function from mdspan so we can easily compare dimensions.

Wrapping ScatterView

Need to scatter into an existing View that is used elsewhere in the code.
Do not want overhead (time and memory capacity) of extra allocation on GPUs

void foo(view_type v) {
   ScatterView sv(v);

   // Kernel to do something with it
   parallel_for(....);

   sv.contribute_into(v);
}

Issues:

On CPUs with duplication I have allocation and deallocation of the duplicates
awkward that I have to pass v again to contribute_into, and what if I pass something else to contribute_into, was v then modified or not?

Version where we keep the `ScatterView` alive

struct Foo {
   ScatterView<....> sv;
   void compute(view_type v) {
      // the is_allocated is there for all static extents case
      if(!sv.is_allocated || sv.extents() != v.extents()) {
        sv = ScatterView{v};
      } else {
        // set duplicates to zero again if necessary
        // but don't reset `v` since we may want to add to existing values
        sv.reset_except(v);
      }
       
      // Use ScatterView
      parallel_for(...);
      
      sv.contribute_into(v);
   }
};

again contribute_into is awkward but now so is reset_except.

Non-Wrapping ScatterView

I just need to accumulate contributions most efficiently
Use the result in one place but don't keep it around for other code areas

void foo() {
   ScatterView<int**> sv("SV", N);

   // Kernel to do some accumulation
   parallel_for(...);

   // get the "master" View
   View<int**> v = sv.subview();
   sv.contribute_into(v);
   
   // Kernel to do something with v
}

Issues:

Uhm subview to get the "master" View?
Ill defined what the values of v are before contribute_into
awkward that I have to pass v again to contribute_into

Version keeping it around

struct Foo {
   ScatterView<....> sv;
   Foo(...) {
      sv = ScatterView{"Name", N, M};
   }
   void compute() {
       sv.reset();
       
       // do something with sv
       parallel_for(...);
       auto v = sv.subview();
       sv.contribute_into(v);     
       
        // do something with v
        parallel_for(...);
   }
};

Fundamental usage proposal:

combine reset_except and contribute_into into a consolidate function
- accumulates values from duplicates into the "master" view and resets them to neutral
add a 'rebind(view)' function to streamline the reuse in the wrapping use-case
- would enable reusing the same ScatterView for multiple views of compatible layout (and size?)
- disallow rebind when in "Non-Wrapping" mode.
- disallow resize and realloc when in "wrapping" mode
rename subview to result_view or something similar

Lets discuss interfaces first:

template<class DataType, class Layout, class DeviceType, class Op, class Duplication, class Contribution>
class ScatterView {
   public:
   // Ctors:
   ScatterView() = default;

   template<class ... ViewArgs>
   ScatterView(View<ViewArgs...>);

   template<class ... ViewArgs>
   ScatterView(execution_space, View<ViewArgs...>);

   template<class ... Extents>
   ScatterView(std::string label, Extents .... exts);

   template<class... P, class ... Extents>
   ScatterView(ViewCtorProp<P...>, Extents .... exts);

   
};

Oct 18 '24 20:10 crtrott

@crtrott, @janciesko -- updated ScatterView documentation is ready for review 😄

Oct 18 '24 21:10 ajpowelsnl

Here is the usage for this thing for the above mentioned cases:

Wrapping ScatterView

Need to scatter into an existing View that is used elsewhere in the code.
Do not want overhead (time and memory capacity) of extra allocation on GPUs

void foo(view_type v) {
   ScatterView sv(v);

   // if you need to zero v out do it here: deep_copy(v, 0)
   // Kernel to do something with it
   parallel_for(....);

   sv.aggregate();
}

Version where we keep the `ScatterView` alive

struct Foo {
   ScatterView<....> sv;
   void compute(view_type v) {
      // the is_allocated is there for all static extents case
      if(!sv.is_allocated || sv.extents() != v.extents()) {
        sv = ScatterView{v};
      } else {
        sv.rebind(v);
      }
       
      // Use ScatterView
      parallel_for(...);
      
      sv.aggregate();
   }
};

Non-Wrapping ScatterView

I just need to accumulate contributions most efficiently
Use the result in one place but don't keep it around for other code areas

void foo() {
   ScatterView<int**> sv("SV", N, M);

   // Kernel to do some accumulation
   parallel_for(...);
   sv.aggregate();

   // get the "master" View
   View<int**> v = sv.target_view();
    // Kernel to do something with v
}

Version keeping it around

struct Foo {
   ScatterView<....> sv;
   Foo(...) {
      sv = ScatterView{"Name", N, M};
   }
   void compute() {
       // resetting values
       deep_copy(sv.target_view(), 0);
       // do something with sv
       parallel_for(...);
       sv.aggregate();     
       
        // do something with v
        parallel_for(...);
   }
};

And a compact test case:

template<class SV>
void test_scatter_view(SV as) {
  auto a = as.target_view();
  EXPECT_EQ(a(0),1);

  // ScatterView does not overwrite values in "target_view"
  Kokkos::parallel_for(20, KOKKOS_LAMBDA(int i) {
    auto s = as.access();
    s(i%10) += 10;
  });
  as.aggregate();
  EXPECT_EQ(a(0),21);

  // previous aggregate resets potential duplicates but not the whole thing
  // you can run multiple kernels before aggregating
  Kokkos::parallel_for(20, KOKKOS_LAMBDA(int i) {
    auto s = as.access();
    s(i%10) += 11;
  });
  Kokkos::parallel_for(20, KOKKOS_LAMBDA(int i) {
    auto s = as.access();
    s(i%10) += 12;
  });
  as.aggregate();
  EXPECT_EQ(a(0),67);

  // setting target_view back to 0 means we start from blank slate
  Kokkos::deep_copy(a, 0);
  Kokkos::parallel_for(20, KOKKOS_LAMBDA(int i) {
    auto s = as.access();
    s(i%10) += 13;
  });
  as.aggregate();
  EXPECT_EQ(a(0),26);
}

template<class Layout, class Strategy>
void test_scatter_view() {
  using sv_t = Kokkos::Experimental::
                 ScatterView<int*, Layout, Kokkos::DefaultExecutionSpace,
                             Kokkos::Experimental::ScatterSum, Strategy>;

  // wrapping
  {
    Kokkos::View<int*> a("A", 10);
    Kokkos::deep_copy(a, 1);
    sv_t as(a);
    test_scatter_view(as);
    Kokkos::printf("%i\n",a(0));
  }
  // non-wrapping
  {
    sv_t as("ScatterView", 10);
    auto at = as.target_view();
    Kokkos::deep_copy(at, 1);
    test_scatter_view(as);
  }
}

Dec 16 '24 19:12 crtrott

ScatterView Improvements

Wrapping ScatterView

Issues:

Version where we keep the ScatterView alive

Non-Wrapping ScatterView

Issues:

Version keeping it around

Fundamental usage proposal:

Wrapping ScatterView

Version where we keep the ScatterView alive

Non-Wrapping ScatterView

Version keeping it around

Version where we keep the `ScatterView` alive

Version where we keep the `ScatterView` alive