arena
arena copied to clipboard
How to set dshm size for training?
When I submit a pytorchjob with arena, I could't find parameters related to shared memory size, which is very important for pytorch training.
The size is fixed to 2Gi.
...
- mountPath: /dev/shm
name: dshm
...
...
- emptyDir:
medium: Memory
sizeLimit: 2Gi
name: dshm
...
Can anyone know how to set dshm size?
When I submit a pytorchjob with arena, I could't find parameters related to shared memory size, which is very important for pytorch training.
The size is fixed to 2Gi.
... - mountPath: /dev/shm name: dshm ... ... - emptyDir: medium: Memory sizeLimit: 2Gi name: dshm ...
Can anyone know how to set dshm size?
OK. I find a workaround solution. Modified file /charts/pytorchjob/values.yaml :
shmSize: 2Gi
to
shmSize: 64Gi # or any value you want
Same issue
/assign