volcano icon indicating copy to clipboard operation
volcano copied to clipboard

[CNCF LFX 2024 01-Mar-May]Volcano support multi-clusters AI workload scheduling.

Open Monokaix opened this issue 2 years ago • 8 comments

What would you like to be added:

Volcano supports multi-cluster AI workload scheduling and provides rich scheduling strategies to choose a appropriate cluster for jobs.

Why is this needed:

Volcano has provided rich AI workloads scheduling capabilities in the field of single-cluster. With the development of multi-cluster management, more and more users use multiple clusters to uniformly manage and run their AI workloads. Volcano needs to support multi-cluster AI job scheduling and provide a series of scheduling capabilities, such as job management, gang scheduling, queue management, etc., so as to select the appropriate cluster for the job, this is the first level of scheduling, the scheduler of each cluster selects the appropriate node for the job, this is second-level scheduling. Here we only need first-level scheduling.

Monokaix avatar Jan 24 '24 07:01 Monokaix

Repo is here: https://github.com/volcano-sh/federation

lowang-bh avatar Jan 24 '24 10:01 lowang-bh

Repo is here: https://github.com/volcano-sh/federation

We should keep working on this: )

Monokaix avatar Jan 25 '24 02:01 Monokaix

Hey @Monokaix I would love to work on this ! I have previous experience working with Karmada. Would love to take it as a challenge , looking forward to it.

RohanMishra315 avatar Jan 31 '24 17:01 RohanMishra315

Hey @Monokaix I would love to work on this ! I have previous experience working with Karmada. Would love to take it as a challenge , looking forward to it.

Hi, thanks for your enthusiasm! Sorry that I didn't mention it's a CNCF LFX project, and you can apply for this project here : )

Monokaix avatar Feb 01 '24 01:02 Monokaix

Hey @Monokaix,

I just noticed that this project is a CNCF LFX project, and I am thrilled to work on this.

Having worked extensively on multi-cluster scheduling and AI, I bring valuable industrial experience to the table. I have experience building scalable cloud-native and AI applications, ranging from traditional deep learning models to cutting-edge Federated Learning models deployed in production environments using frameworks like flower, FedML and PySyft

I also have hands-on experience with Karmada and would love to explore more and do valuable contribution.

By getting this opportunity I would like to leverage my Multi-cloud, multi-cluster and AI skillset under the guidance of the establised engineers at Volcano.

SpringWiz11 avatar Feb 02 '24 07:02 SpringWiz11

Hey @Monokaix,

I just noticed that this project is a CNCF LFX project, and I am thrilled to work on this.

Having worked extensively on multi-cluster scheduling and AI, I bring valuable industrial experience to the table. I have experience building scalable cloud-native and AI applications, ranging from traditional deep learning models to cutting-edge Federated Learning models deployed in production environments using frameworks like flower, FedML and PySyft

I also have hands-on experience with Karmada and would love to explore more and do valuable contribution.

By getting this opportunity I would like to leverage my Multi-cloud, multi-cluster and AI skillset under the guidance of the establised engineers at Volcano.

Welcome! And you can apply here.

Monokaix avatar Feb 02 '24 08:02 Monokaix

Hi @Monokaix,

I just applied to the CNCF LFX Mentorship program for this project. I am very interested in this project and would love to contribute to it. Is there any advice you have for me to get to understand the codebase and start with the good-first-issue issues?

TrungBui59 avatar Feb 03 '24 11:02 TrungBui59

hi! im very interested on this issue, and i just aplied the lfx now, im the karmada reviewer now, u can take a look about my github page~

Vacant2333 avatar Feb 09 '24 01:02 Vacant2333