Denis Demidov
Denis Demidov
Yes. Feel free to provide a pull request :).
So I looked at mba implementation a bit closer. Now I know why I decided to stay on the CPU for the initialization. First, and least important, VexCL supports parallel...
Ok, I _did_ something wrong here. After replacing critical section with atomic in ecad92c. `mba_bechmark` output looks like this: ``` 1. Capeverde (AMD Accelerated Parallel Processing) surf(0.5, 0.5) = -4.48714e-05...
There are no host-allocated structures in the snippet you provided. Both `vex::multivector` and `vex::vector` are device structures. And those are allocated directly. If what you meant to ask is if...
Regarding the MBA algorithm, if you have a closer look at `BA algorithm` on page 4 of the paper, you'll notice that there is an outer loop over scattered data...
Ok, the question about data allocation is a lot clearer now. You could do this: ``` .cpp vex::vector x(ctx, n), y(ctx, n), z(ctx, n); auto i = vex::element_index(); x =...
Regarding a K-way split of input data, how would you do it on CPU? Would each core skip points that do not belong to its subdomain? Or would you do...
Regarding the deallocation: `vex::vector`s behave in the same way `std::vector`s do. They deallocate themselves when going out of scope. That's an example of [RAII](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization) idiom.
No, you just let them go out of scope. No memory will leak.
The technique described in the paper by Harrison et al (and domain decomposition in general) is suitable for fat cluster nodes or CPU cores. This is an example of coarse-grain...