karmada
karmada copied to clipboard
[lfx-mentorship-2022-summer]Cluster Resource modeling
What would you like to be added:
We don't want to collect and store each node's resources in detail(That's a burden for Karmada to maintain the information), but we want to build a resource model
for each cluster, something like:
resourceModel:
- grade:
cpu: "1"
memory: 2Gi
count: 10
- grade:
cpu: "2"
memory: 4Gi
count: 6
- grade:
cpu: "4"
memory: 8Gi
count: 2
- grade:
cpu: "8"
memory: 16Gi
count: 1
Why is this needed:
In the scheduling progress, the karmada-scheduler
makes decisions as per a bunch of factors, one of the factors is the resource details of the cluster.
We introduced ResourceSummary to the Cluster API. For example:
resourceSummary:
allocatable:
cpu: "4"
ephemeral-storage: 206291924Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 16265856Ki
pods: "110"
allocated:
cpu: 950m
memory: 290Mi
pods: "11"
But the ResourceSummary
is not precise enough, it mechanically counts the resources on all nodes, but ignores the fragment resources.(For example, a cluster with 2000 node, 1 core cpu left on each node, from the ResourceSummary
, we get there are 2000 core CPU left for the cluster, that's not correct.)
References
- LFX Mentorship: https://github.com/cncf/mentoring/tree/main/lfx-mentorship
- timeline: https://github.com/cncf/mentoring/tree/main/lfx-mentorship/2022/02-Summer#timeline
Hello @RainbowMango , hope you're doing well , this is Anutosh here from India . I'm an open source enthusiast and I'm currently involved with communities based on numerical and symbolic computations/algorithms in math and physics like numpy, sympy, networkx.
I am keen to take part in the LFX Mentorship program for the summer term and this project interests me. But being new to the project , I would be glad if you could suggest any relevant resources/links I should be going through as a beginner for getting to know the project and the library better . Thank you !
Hello @RainbowMango , hope you're doing well , this is Anutosh here from India . I'm an open source enthusiast and I'm currently involved with communities based on numerical and symbolic computations/algorithms in math and physics like numpy, sympy, networkx.
I am keen to take part in the LFX Mentorship program for the summer term and this project interests me. But being new to the project , I would be glad if you could suggest any relevant resources/links I should be going through as a beginner for getting to know the project and the library better . Thank you !
Hey @RainbowMango Any Update on this comment, I would like to volunteer to work on this issue under LFX Mentorship Program? Happy to send you a proposal about the same, Thanks!!
@AALEKH @anutosh491 Thanks for reaching us. This task requires some basic knowledge about Kubernetes, and after that, you can get started with Karmada quick start. Here is some documents too that might be helpful to understand the project.
@AALEKH @anutosh491 Thanks for reaching us. This task requires some basic knowledge about Kubernetes, and after that, you can get started with Karmada quick start. Here is some documents too that might be helpful to understand the project.
Thank you @RainbowMango I have been learning more about karmada, the problem it solves and the functionality behind it . I will be going through the docs soon and then start working on my application !
EDIT1: I went through most of the resources shared above . I am now much more comfortable with Karmada and have a better understanding of how Karmada operates. Thanks for the resources : )
resourceModel:
- grade:
cpu: "1"
memory: 2Gi
count: 10
- grade:
cpu: "2"
memory: 4Gi
count: 6
Hello @RainbowMango , I've been framing my cover letter for the LFX mentorship program and had couple doubts regarding this proposed model.
- What information does
grade
andcount
convey ? I realize that grade would be typecorev1.ResourceList
and would be carrying pairs of resources and quantity ! -
resourceModel
would also be introduced in theCluster API
only right ? - Also could you elaborate a bit more on what all fragmented resources you're talking about in this line ?
But the ResourceSummary is not precise enough, it mechanically counts the resources on all nodes, but ignores the fragment resources
- Also is there any other file /code chunk in any file you would like me to go through ? I've gone through some files completely like
pkg/apis/cluster.types.go
,pkg/scheduler/core/generic_scheduler.go
which helped me in general gain more idea about the project ! ( I plan to go through the failover/ rescheduling algorithms code i.e. division.go and the other one sometime soon)
Also does this call for removing or rather deprecating the Resource Summary
class and used objects throughout the codebase ?
Hello @RainbowMango sir , could you please help me with the doubts I've asked above as today is the last day to apply ! I am actually ready with my application material but just want to confirm these basic doubts before turning in my application . Thanks in advance !
What information does grade and count convey ? I realize that grade would be type corev1.ResourceList and would be carrying pairs of resources and quantity !
The grade
and count
on the issue are examples of what kind of things we are trying to build.
resourceModel would also be introduced in the Cluster API only right?
Probably. Given the Cluster
object is a large object, another option may be to build a separated API to store the module.
Where and how to store the module is not the key concern here, more important is how to describe a cluster's resource situation by the model.
Also could you elaborate a bit more on what all fragmented resources you're talking about in this line ?
Please see the example:
For example, a cluster with 2000 node, 1 core cpu left on each node, from the ResourceSummary, we get there are 2000 core CPU left for the cluster, that's not correct.
Also does this call for removing or rather deprecating the Resource Summary class and used objects throughout the codebase ?
Probably.
@halfrost, please assign this issue to you by command: /assign @halfrost to show you are working on this.
@RainbowMango: GitHub didn't allow me to assign the following users: halfrost.
Note that only karmada-io members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
@halfrost, please assign this issue to you by command: /assign @halfrost to show you are working on this.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign @halfrost
@halfrost Could you please post the API design here? I know some guys who are interested in it.
OK, I will write a document about the detail of API design.
/close in favor of #2379
@RainbowMango: Closing this issue.
In response to this:
/close in favor of #2379
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.