kokkos icon indicating copy to clipboard operation
kokkos copied to clipboard

ScatterView Improvements

Open crtrott opened this issue 5 years ago • 10 comments

Just some random thoughts before we take this out of experimental:

  • make all the int enums tag classes
    • would enable make_scatter_view(view,ScatterSum,ScatterAtomic,ScatterNonDuplicated)
  • should this thing have extent(..)?
  • getting the internal view works via "subview"??
  • ScatterValue:
    • should they be copyable at all?
    • what is the internal handle type
    • remove join
    • consistently use the update function in operators

crtrott avatar May 08 '20 17:05 crtrott

Also see #1390 and #1392.

stanmoore1 avatar May 08 '20 18:05 stanmoore1

With emphasis on this comment by @crtrott https://github.com/kokkos/kokkos/issues/1390#issuecomment-363463835

stanmoore1 avatar May 08 '20 18:05 stanmoore1

And #2276, which is related to @crtrott's third bullet point.

stanmoore1 avatar May 08 '20 18:05 stanmoore1

Via Matt: ScatterView should have a way to tell if it is allocated. Using subview().extent(0) can segfault as it doesn't construct safe with the default ctr.

DavidPoliakoff avatar May 11 '20 13:05 DavidPoliakoff

  • Getting rid of reset_except replace with reset_duplicates
  • add is_allocated to all the data classes.

crtrott avatar May 13 '20 18:05 crtrott

One improvement, and I'm not sure how much it would help, thinking GPU here, if you have a block dimension of (1, 32, 1) you would replicate 32x across the blocks, and use atomics on these data sets. This way you don't have different grids accessing the atomics...

bathmatt avatar Jun 02 '20 22:06 bathmatt

More thoughts:

  • make ScatterView::subview() consistent with View::subview() //semantic mismatch

janciesko avatar Dec 03 '21 19:12 janciesko

Hi @janciesko -- is this issue still on your RADAR?

ajpowelsnl avatar Jun 06 '22 19:06 ajpowelsnl

Here is a bit more on ScatterView, I gonna start tackling this for real:

Let's start with two use-case scenarios: wrapping and non-wrapping. And let us assume we had decent CTAD and that we had an extents() function from mdspan so we can easily compare dimensions.

Wrapping ScatterView

  • Need to scatter into an existing View that is used elsewhere in the code.
  • Do not want overhead (time and memory capacity) of extra allocation on GPUs
void foo(view_type v) {
   ScatterView sv(v);

   // Kernel to do something with it
   parallel_for(....);

   sv.contribute_into(v);
}

Issues:

  • On CPUs with duplication I have allocation and deallocation of the duplicates
  • awkward that I have to pass v again to contribute_into, and what if I pass something else to contribute_into, was v then modified or not?

Version where we keep the ScatterView alive

struct Foo {
   ScatterView<....> sv;
   void compute(view_type v) {
      // the is_allocated is there for all static extents case
      if(!sv.is_allocated || sv.extents() != v.extents()) {
        sv = ScatterView{v};
      } else {
        // set duplicates to zero again if necessary
        // but don't reset `v` since we may want to add to existing values
        sv.reset_except(v);
      }
       
      // Use ScatterView
      parallel_for(...);
      
      sv.contribute_into(v);
   }
};
  • again contribute_into is awkward but now so is reset_except.

Non-Wrapping ScatterView

  • I just need to accumulate contributions most efficiently
  • Use the result in one place but don't keep it around for other code areas
void foo() {
   ScatterView<int**> sv("SV", N);

   // Kernel to do some accumulation
   parallel_for(...);

   // get the "master" View
   View<int**> v = sv.subview();
   sv.contribute_into(v);
   
   // Kernel to do something with v
}

Issues:

  • Uhm subview to get the "master" View?
  • Ill defined what the values of v are before contribute_into
  • awkward that I have to pass v again to contribute_into

Version keeping it around

struct Foo {
   ScatterView<....> sv;
   Foo(...) {
      sv = ScatterView{"Name", N, M};
   }
   void compute() {
       sv.reset();
       
       // do something with sv
       parallel_for(...);
       auto v = sv.subview();
       sv.contribute_into(v);     
       
        // do something with v
        parallel_for(...);
   }
};

Fundamental usage proposal:

  • combine reset_except and contribute_into into a consolidate function
    • accumulates values from duplicates into the "master" view and resets them to neutral
  • add a 'rebind(view)' function to streamline the reuse in the wrapping use-case
    • would enable reusing the same ScatterView for multiple views of compatible layout (and size?)
    • disallow rebind when in "Non-Wrapping" mode.
    • disallow resize and realloc when in "wrapping" mode
  • rename subview to result_view or something similar

Lets discuss interfaces first:

template<class DataType, class Layout, class DeviceType, class Op, class Duplication, class Contribution>
class ScatterView {
   public:
   // Ctors:
   ScatterView() = default;

   template<class ... ViewArgs>
   ScatterView(View<ViewArgs...>);

   template<class ... ViewArgs>
   ScatterView(execution_space, View<ViewArgs...>);

   template<class ... Extents>
   ScatterView(std::string label, Extents .... exts);

   template<class... P, class ... Extents>
   ScatterView(ViewCtorProp<P...>, Extents .... exts);

   
};

crtrott avatar Oct 18 '24 20:10 crtrott

@crtrott, @janciesko -- updated ScatterView documentation is ready for review 😄

ajpowelsnl avatar Oct 18 '24 21:10 ajpowelsnl

Here is the usage for this thing for the above mentioned cases:

Wrapping ScatterView

  • Need to scatter into an existing View that is used elsewhere in the code.
  • Do not want overhead (time and memory capacity) of extra allocation on GPUs
void foo(view_type v) {
   ScatterView sv(v);

   // if you need to zero v out do it here: deep_copy(v, 0)
   // Kernel to do something with it
   parallel_for(....);

   sv.aggregate();
}

Version where we keep the ScatterView alive

struct Foo {
   ScatterView<....> sv;
   void compute(view_type v) {
      // the is_allocated is there for all static extents case
      if(!sv.is_allocated || sv.extents() != v.extents()) {
        sv = ScatterView{v};
      } else {
        sv.rebind(v);
      }
       
      // Use ScatterView
      parallel_for(...);
      
      sv.aggregate();
   }
};

Non-Wrapping ScatterView

  • I just need to accumulate contributions most efficiently
  • Use the result in one place but don't keep it around for other code areas
void foo() {
   ScatterView<int**> sv("SV", N, M);

   // Kernel to do some accumulation
   parallel_for(...);
   sv.aggregate();

   // get the "master" View
   View<int**> v = sv.target_view();
    // Kernel to do something with v
}

Version keeping it around

struct Foo {
   ScatterView<....> sv;
   Foo(...) {
      sv = ScatterView{"Name", N, M};
   }
   void compute() {
       // resetting values
       deep_copy(sv.target_view(), 0);
       // do something with sv
       parallel_for(...);
       sv.aggregate();     
       
        // do something with v
        parallel_for(...);
   }
};

And a compact test case:

template<class SV>
void test_scatter_view(SV as) {
  auto a = as.target_view();
  EXPECT_EQ(a(0),1);

  // ScatterView does not overwrite values in "target_view"
  Kokkos::parallel_for(20, KOKKOS_LAMBDA(int i) {
    auto s = as.access();
    s(i%10) += 10;
  });
  as.aggregate();
  EXPECT_EQ(a(0),21);

  // previous aggregate resets potential duplicates but not the whole thing
  // you can run multiple kernels before aggregating
  Kokkos::parallel_for(20, KOKKOS_LAMBDA(int i) {
    auto s = as.access();
    s(i%10) += 11;
  });
  Kokkos::parallel_for(20, KOKKOS_LAMBDA(int i) {
    auto s = as.access();
    s(i%10) += 12;
  });
  as.aggregate();
  EXPECT_EQ(a(0),67);

  // setting target_view back to 0 means we start from blank slate
  Kokkos::deep_copy(a, 0);
  Kokkos::parallel_for(20, KOKKOS_LAMBDA(int i) {
    auto s = as.access();
    s(i%10) += 13;
  });
  as.aggregate();
  EXPECT_EQ(a(0),26);
}

template<class Layout, class Strategy>
void test_scatter_view() {
  using sv_t = Kokkos::Experimental::
                 ScatterView<int*, Layout, Kokkos::DefaultExecutionSpace,
                             Kokkos::Experimental::ScatterSum, Strategy>;

  // wrapping
  {
    Kokkos::View<int*> a("A", 10);
    Kokkos::deep_copy(a, 1);
    sv_t as(a);
    test_scatter_view(as);
    Kokkos::printf("%i\n",a(0));
  }
  // non-wrapping
  {
    sv_t as("ScatterView", 10);
    auto at = as.target_view();
    Kokkos::deep_copy(at, 1);
    test_scatter_view(as);
  }
}

crtrott avatar Dec 16 '24 19:12 crtrott