volcano icon indicating copy to clipboard operation
volcano copied to clipboard

I must create a pod with multiple containers,but volcano don't support to limit each container gpu-memory.

Open MichaelHuang88 opened this issue 3 years ago • 15 comments

apiVersion: v1 kind: Pod metadata: name: gpu-pod spec: schedulerName: volcano containers: - name: container1 image: XXX/XXX resources: limits: volcano.sh/gpu-memory: 1024 - name: containerN image: XXX/XXX resources: limits: volcano.sh/gpu-memory: 1024

https://github.com/volcano-sh/volcano/issues/1858

MichaelHuang88 avatar Oct 22 '22 12:10 MichaelHuang88

Could you please give more details about the job & volcano configurations? It should work by using GPU-share features following user guide.

jiangkaihua avatar Oct 24 '22 02:10 jiangkaihua

You can more info in https://github.com/volcano-sh/volcano/issues/1858 Vocalno grammer only support one container‘s gpu-memory limit in one pod, others containers with no gpu-memory limit in one pod, fail to start,because https://www.yuque.com/docs/share/62f95eb0-5dfe-4045-9a8a-f63c9341780c?#

MichaelHuang88 avatar Oct 31 '22 04:10 MichaelHuang88

Could you please give more details about the job & volcano configurations? It should work by using GPU-share features following user guide.

Can you help me ? Thank you very much.

MichaelHuang88 avatar Nov 11 '22 02:11 MichaelHuang88

Hi ! I would like to contribute to this issue. Kindly guide me through the process.

aarushisoni avatar Nov 14 '22 14:11 aarushisoni

image A Pod has multiple containers. An error is reported when the container sets the GPU limit resources: limits: volcano.sh/gpu-memory: 1024

GLL550C avatar Nov 18 '22 08:11 GLL550C

I will try to reproduce and fix this issue /assign

hwdef avatar Nov 23 '22 03:11 hwdef

This issue has been successfully reproduced in volcano 1.16 and will be resolved in the future.

hwdef avatar Dec 04 '22 11:12 hwdef

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Mar 18 '23 20:03 stale[bot]

Retain the current issue. /remove-lifecycle stale

wangyang0616 avatar Mar 24 '23 09:03 wangyang0616

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Aug 10 '23 01:08 stale[bot]

This issue has been successfully reproduced in volcano 1.16 and will be resolved in the future.

Any progress here?

Monokaix avatar May 14 '24 03:05 Monokaix

This issue has been successfully reproduced in volcano 1.16 and will be resolved in the future.

When will it be resolved? Thanks.

MichaelHuang88 avatar Jul 17 '24 08:07 MichaelHuang88