vitesselin
vitesselin
Dear Alazzaro, 1. I cannot fix this issue when I change the code that you mention in the #229 . ` CALL acc_devmem_setzero_bytes(area%d%acc_devmem, s*l+1, s*u, area%d%memory_type%acc_stream)` 2. It still hit...
Yes, one node with 2 GPU cards.
@alazzaro It still hits the problem and I also print out the length and offset for it. ``` > test_name large_blocks_1 > numthreads 2 > numnodes 2 > matrix_sizes 500...
@oschuett Yes, it looks like similar to the #205. I observed that DBCSR only did launchkernel in the first device even if I have two or more. Furthermore, DBCSR allocated...
@alazzaro > I assume that 1 thread the code should work fine? Yes, the issue is always found in process 2 with thread id 1. For the CUDA threading problem,...
@alazzaro > Concerning your idea of peer2peer access right for each GPU, yes, it can be doable by setting > > export CUDA_VISIBLE_DEVICES=0 --> rank 0 > export CUDA_VISIBLE_DEVICES=1 -->...
@alazzaro Is there any plan of DBCSR to support the Multi-GPU in one node in the neer future?
@alazzaro You mentioned this before. > the way DBCSR uses the multi-gpu is via multi-ranks, where each rank is attached to a GPU, this is done in a round-robin fashion....