Goran Flegar comments

Results 36 comments of


                                            Goran Flegar

Full-Sized Ginkgo Tutorial

I wouldn't say that a "full-sized tutorial" can fit into 60-90 minutes. There is a [TOC for a tutorial in the wiki](https://github.com/ginkgo-project/ginkgo/wiki/Tutorial:-Building-a-2D-Poisson-Solver) that covers all of Ginkgo's major features, but...

Code generation macros impairing easy code navigation and analysis

# Note to everyone commenting here Before forming an opinion on this, please try writing some of the code in Ginkgo __without__ using macros. Most of them are there precisely...

Code generation macros impairing easy code navigation and analysis

> A central part is that the code has to be understandable for outsiders. You do realize that this depends on what level of knowledge we assume the "outsider" has...

Code generation macros impairing easy code navigation and analysis

@tcojean I assume you have a concrete tool which is giving you problems with macros. Just out of curiosity, is that tool capable of finding the definitions of methods exposed...

Code generation macros impairing easy code navigation and analysis

Btw, I just tried this in vim with the [YCM](http://valloric.github.io/YouCompleteMe/) plugin (it uses clang to provide semantics). In `core/solver/cg.hpp`, when positioning the cursor over `parameters_` on line 123 and issuing...

Code generation macros impairing easy code navigation and analysis

BTW, I think [this](https://github.com/Valloric/YouCompleteMe/issues/2938) is the reason why go to definition doesn't work with templates in YCM. So it seems it can be made to work by changing a single...

Preconditioner overhead

There should be only a single copy - no allocation and no free, can't see where it's coming from right now. I think I found the problem. Here is what...

Preconditioner overhead

I would first determine if that copy is even a bottleneck or not. What we were measuring before is allocation + copy + free. If you look at the timeline...

Preconditioner overhead

``` step1: 3n reads + 1n writes -> 47 us (778 GB/s) precond. copy: 1n reads + 1n writes -> 30 us (609 GB/s) spmv (csr?): 3/2(nnz + n) reads...

Preconditioner overhead

Just ran `1_Utilities/p2pBandwidthLatencyTest` on juwels. Intra-device copy should achieve around 750 GB/s. Note though, that we're using `cudaMemcpyPeer` and the CUDA sample uses `cudaMemcpyPeerAsync` to do the copy. Detailed results...