volcano
volcano copied to clipboard
RFVE: Support GPU topology
Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature /area scheduling /priority important-soon
Description:
GPU topology is important to the performance of running tasks, it necessary to imporve both scheduler and kubelet for GPU topology.
/cc @carmark @Jeffwan
I found one relevent project on this year's google summer of code project. https://summerofcode.withgoogle.com/archive/2019/projects/6336863634194432/
@k82cn @Rui-Tang You may refer this project as a example.
@carmark , thanks very much for the info :) I'd like to build something based on that example :)
/kind rfve
I was taking days off recently and didn't get a chance to attend the meeting.
Topology Manager integration with device plugin is in the latest kubernetes. Do we plan to leverage this or do something different? https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#device-plugin-integration-with-the-topology-manager
@Jeffwan The official topology manager for Device Plugin is just for the devices with socket information. But for different devices and topology, we may need more information and more detailed scheduler for better performance, such as GPUs, select the two GPUs in a Socket or in a PIX or in a PHB?
@Jeffwan The official topology manager for Device Plugin is just for the devices with socket information. But for different devices and topology, we may need more information and more detailed scheduler for better performance, such as GPUs, select the two GPUs in a Socket or in a PIX or in a PHB?
Yeah. It makes sense.
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗
/feature
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
How is this issue going?
How about using nvml.DeviceGetTopologyCommonAncestor
to build gpu tree
And I don't think using device-plugin to build gpu tree is a good way. Although device-plugin can use with volcano cooperatively, we still need to judge if the gpu resource in node is enough in function predicate
. So why not build gpu tree in volcano directly and device-plugin just need to bind specific one gpu.
And I don't think using device-plugin to build gpu tree is a good way. Although device-plugin can use with volcano cooperatively, we still need to judge if the gpu resource in node is enough in function
predicate
. So why not build gpu tree in volcano directly and device-plugin just need to bind specific one gpu.
nvml needs to run on a specific node and then get topo info, how to build a topo tree in volcano directly?
How is this issue going?
Not so much, https://github.com/volcano-sh/volcano/pull/1779 does some research on GPU topology, but there is no specific plan
And I don't think using device-plugin to build gpu tree is a good way. Although device-plugin can use with volcano cooperatively, we still need to judge if the gpu resource in node is enough in function
predicate
. So why not build gpu tree in volcano directly and device-plugin just need to bind specific one gpu.nvml needs to run on a specific node and then get topo info, how to build a topo tree in volcano directly?
or if we can input the gpu topology by string.
And I don't think using device-plugin to build gpu tree is a good way. Although device-plugin can use with volcano cooperatively, we still need to judge if the gpu resource in node is enough in function
predicate
. So why not build gpu tree in volcano directly and device-plugin just need to bind specific one gpu.nvml needs to run on a specific node and then get topo info, how to build a topo tree in volcano directly?
or if we can input the gpu topology by string.
not quite understand what you mean, do you mean that configure gpu topology of each node by configuration in volcano?
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗