Parla.py
Parla.py copied to clipboard
Not setting memory argument can cause crashes
When creating tasks with @spawn
, if the memory
argument isn't set, the scheduler seems to assume the task will actually take no memory. Thus, tasks can fill up a device and cause it to crash. I recommend that by default if neither memory
nor vcus
are set, only one task can run per device (particularly for GPUs).
Simple repro on Frontera:
cd Parla.py/benchmarks/qr_factorization
Replace this line
@spawn(taskid=T1[i], placement=PLACEMENT, memory=T1_MEMORY)
with
@spawn(taskid=T1[i], placement=PLACEMENT)
then run
python qr_parla.py -r 1600000 -c 1000 -b 100000 -i 1 -p gpu -g 1