Alfio Lazzaro
Alfio Lazzaro
> > I'm sure ELPA works in multi-gpu. The question is how. In CP2K, there are other libraries that use GPUs (Sirius, COSMA, >DBCSR, and more will come). For instance,...
> > In particular @alazzaro I see you recently switched the 32-H2O input file to use Scalapack instead of ELPA, but with a quick fix in ELPA it runs for...
For NCCL, I would only consider it for nodes with multi-gpus. Note that NCCL requires itself the system network support. Furthermore, there is the limitation of having a single rank/GPU....
> I've talked to an Nvidia engineer: The NCCL restriction of one MPI rank per GPU won't go away anytime soon. Same for AMD... > > I've also learned that...
Well, the length of the buffers is a int_4 (I mean nze elements for instance). We need to change in other places to get full benefit of sending more than...
I agree with the change. I was not referring to move "everything" to int_8 (yes, a lot of work). My point is that there can be some places where we...
For the reference, I've currently made MPI allocation off by default. It appears to be useless (or worsen) in most of the cases...
Yes, the issue is still open for that... Just changed the milestone to 2.1...
Well, let's actually keep this issue open for FORD documentation improvements...
- [x] Add DBCSR version to the report - [x] Add link in the README and wiki - [ ] Include C/C++ files - [x] Include .f90 files Note: currently...