Goran Flegar

Results 36 comments of Goran Flegar

I wouldn't say that a "full-sized tutorial" can fit into 60-90 minutes. There is a [TOC for a tutorial in the wiki](https://github.com/ginkgo-project/ginkgo/wiki/Tutorial:-Building-a-2D-Poisson-Solver) that covers all of Ginkgo's major features, but...

# Note to everyone commenting here Before forming an opinion on this, please try writing some of the code in Ginkgo __without__ using macros. Most of them are there precisely...

> A central part is that the code has to be understandable for outsiders. You do realize that this depends on what level of knowledge we assume the "outsider" has...

@tcojean I assume you have a concrete tool which is giving you problems with macros. Just out of curiosity, is that tool capable of finding the definitions of methods exposed...

Btw, I just tried this in vim with the [YCM](http://valloric.github.io/YouCompleteMe/) plugin (it uses clang to provide semantics). In `core/solver/cg.hpp`, when positioning the cursor over `parameters_` on line 123 and issuing...

BTW, I think [this](https://github.com/Valloric/YouCompleteMe/issues/2938) is the reason why go to definition doesn't work with templates in YCM. So it seems it can be made to work by changing a single...

There should be only a single copy - no allocation and no free, can't see where it's coming from right now. I think I found the problem. Here is what...

I would first determine if that copy is even a bottleneck or not. What we were measuring before is allocation + copy + free. If you look at the timeline...

``` step1: 3n reads + 1n writes -> 47 us (778 GB/s) precond. copy: 1n reads + 1n writes -> 30 us (609 GB/s) spmv (csr?): 3/2(nnz + n) reads...

Just ran `1_Utilities/p2pBandwidthLatencyTest` on juwels. Intra-device copy should achieve around 750 GB/s. Note though, that we're using `cudaMemcpyPeer` and the CUDA sample uses `cudaMemcpyPeerAsync` to do the copy. Detailed results...