Aurelien Bouteiller
Aurelien Bouteiller
There are 2 different failure modicum: 1. stress tries to allocate 33GB of memory, which may or may not be possible, especially on low-end cuda devices, or as host memory....
There is a third problem: The JDF of the stress tester is not symmetrical https://github.com/ICLDisco/parsec/blob/1ababbe248064c5f3deaab2f9b04e56b556a3f02/tests/runtime/cuda/stress.jdf#L106 https://github.com/ICLDisco/parsec/blob/1ababbe248064c5f3deaab2f9b04e56b556a3f02/tests/runtime/cuda/stress.jdf#L128 ## Buggy behavior This causes occasionally the following behavior > flow GEMM(1,1,0) B data._f_B.source_repo...
doing the merge now
The following command produces the correct binding for testing (as witnessed from hwloc-ls, the binding is correctly restricted by mpiexec to the correct sockets). ' ``` PARSEC_MCA_device_cuda_memory_use=10 salloc -N1 -whexane...
It looks like the message above is partially misleading: we initialized the vpmap before we extract the ALLOWED mask, so the vpmap_from_flat initializes something that is not the same as...
Looks like inherited binding is correct (beside the problem with the vpmap above, comparing `--bind-to none -c 8` vs `--bind-to socket` results in double the performance and using 16 hw...
vpmap initialization creates and fills ` parsec_vpmap[vp].threads[t+ht].cpuset = HWLOC_ALLOC();` with all sorts of intricate things (that are not abiding with the restricted mask) but these are write only variables. ~~At...
Please reopen if you hard-disagree with following malloc-like API.
**Original comment by George Bosilca (Bitbucket: [bosilca](https://bitbucket.org/bosilca), GitHub: [bosilca](https://github.com/bosilca)).** ---------------------------------------- We need to revisit this ASAP.
Removing version: 3.0.0 (automated comment)