Goran Flegar
Goran Flegar
I wouldn't say that a "full-sized tutorial" can fit into 60-90 minutes. There is a [TOC for a tutorial in the wiki](https://github.com/ginkgo-project/ginkgo/wiki/Tutorial:-Building-a-2D-Poisson-Solver) that covers all of Ginkgo's major features, but...
# Note to everyone commenting here Before forming an opinion on this, please try writing some of the code in Ginkgo __without__ using macros. Most of them are there precisely...
> A central part is that the code has to be understandable for outsiders. You do realize that this depends on what level of knowledge we assume the "outsider" has...
@tcojean I assume you have a concrete tool which is giving you problems with macros. Just out of curiosity, is that tool capable of finding the definitions of methods exposed...
Btw, I just tried this in vim with the [YCM](http://valloric.github.io/YouCompleteMe/) plugin (it uses clang to provide semantics). In `core/solver/cg.hpp`, when positioning the cursor over `parameters_` on line 123 and issuing...
BTW, I think [this](https://github.com/Valloric/YouCompleteMe/issues/2938) is the reason why go to definition doesn't work with templates in YCM. So it seems it can be made to work by changing a single...
There should be only a single copy - no allocation and no free, can't see where it's coming from right now. I think I found the problem. Here is what...
I would first determine if that copy is even a bottleneck or not. What we were measuring before is allocation + copy + free. If you look at the timeline...
``` step1: 3n reads + 1n writes -> 47 us (778 GB/s) precond. copy: 1n reads + 1n writes -> 30 us (609 GB/s) spmv (csr?): 3/2(nnz + n) reads...
Just ran `1_Utilities/p2pBandwidthLatencyTest` on juwels. Intra-device copy should achieve around 750 GB/s. Note though, that we're using `cudaMemcpyPeer` and the CUDA sample uses `cudaMemcpyPeerAsync` to do the copy. Detailed results...