Aurelien Bouteiller

Results 146 comments of Aurelien Bouteiller

I have a followup commit that will do the same for the `ze` component but I can't test atm so I'll bring it later.

will review that it doesn't break dplasma and merge

@G-Ragghianti we don't have any windows runners do we?

stress:gpu failed, I think that's new got this with ` PMIX_MCA_psec='' SLURM_TIMELIMIT=100 srun -wleconte -n 1 --gpus-per-task=1 ctest -R stress --verbose` ``` 51: stress: /home/bouteill/parsec/dplasma/parsec/parsec/mca/device/device_gpu.c:891: int parsec_device_data_reserve_space(parsec_device_gpu_module_t *, parsec_gpu_task_t *):...

Resolved with https://github.com/ICLDisco/parsec/pull/694/commits/c5abdbf3309fbca07f72af4fd4619319ea813cc8 wip: lldb output from `lldb` seems to indicate that we need to re-zero dsl-tasks after first use (dplasma 2gpu will fail rarely and randomly with this stack...

New problem: all tests with 2gpus appear to leak/over-retain gpu memory ``` 237/441 Test #237: dplasma_cgemm_2gpu_cuda_shm .....................................***Failed 1.78 sec W@00000 /!\ DEBUG LEVEL WILL PROBABLY REDUCE THE PERFORMANCE OF THIS...

> I saw a notification with a lldb stack trace but I can't find it here. Weird! Anyway, it seemed to indicate that the `complete_stage` was pointing to an incorrect...

Yes, the plan is to review and ideally transition all 'maybes' to 'yes'. They are pretty simple and a quick review should get us there.

We did a review and are now ready to create a PR that promotes as needed

Why is this needed? On master (w/o this PR) Setting the environment variable already lets one fine control the allotment of devices to the parsec processes. What do we gain...