Kihiro Bando

Results 14 comments of Kihiro Bando

yes, I would be interested in that. I am on 3.3.00 for both kokkos and kokkos-kernels, would this work? does it also give detailed metrics (kind of like nvprof) or...

so the actual code I am writing must repeat the parallel-for? also I would be interested in the training material you mentioned earlier.

Is it supposed to be a pain to build? I am running through these dependecies: Apollo > callpath > adept-utils > llnl-hires-times. And adept-utils complains that I don't have boost...

Ok, so I ran with Apollo which output its .yaml files and ran a second time using nvprof with the .yaml files in the current directory. It ran with a...

So I was hoping for my use case to be precise enough so as to restrict the number of performance knobs. More precisely (see also the original post of this...

No my views are imposed by other aspects of the code which makes it sub-optimal for the operation I am describing. I am aware of this and am trying to...

>Is it possible for you to use a temporary view C to hold A*B and copy C to B at the end? How slow it is? How different is that...

If I allocate space for a `C` then I don't need to copy back to `B` so that's not a problem

Using another view is OK. I would like to avoid allocating it. But currently, the performance of this kernel is pretty bad compared to the rest of the code so...