Parla.py icon indicating copy to clipboard operation
Parla.py copied to clipboard

Not setting memory argument can cause crashes

Open sestephens73 opened this issue 3 years ago • 1 comments

When creating tasks with @spawn, if the memory argument isn't set, the scheduler seems to assume the task will actually take no memory. Thus, tasks can fill up a device and cause it to crash. I recommend that by default if neither memory nor vcus are set, only one task can run per device (particularly for GPUs).

Simple repro on Frontera:

cd Parla.py/benchmarks/qr_factorization

Replace this line

@spawn(taskid=T1[i], placement=PLACEMENT, memory=T1_MEMORY)

with

@spawn(taskid=T1[i], placement=PLACEMENT)

then run

python qr_parla.py -r 1600000 -c 1000 -b 100000 -i 1 -p gpu -g 1

sestephens73 avatar Apr 05 '21 09:04 sestephens73