karmada [lfx-mentorship-2022-summer]Cluster Resource modeling

What would you like to be added: We don't want to collect and store each node's resources in detail(That's a burden for Karmada to maintain the information), but we want to build a resource model for each cluster, something like:

resourceModel:
  - grade:
      cpu: "1"
      memory: 2Gi
    count: 10
  - grade:
      cpu: "2"
      memory: 4Gi
    count: 6
  - grade:
      cpu: "4"
      memory: 8Gi
    count: 2
  - grade:
      cpu: "8"
      memory: 16Gi
    count: 1

Why is this needed: In the scheduling progress, the karmada-scheduler makes decisions as per a bunch of factors, one of the factors is the resource details of the cluster.

We introduced ResourceSummary to the Cluster API. For example:

  resourceSummary:
    allocatable:
      cpu: "4"
      ephemeral-storage: 206291924Ki
      hugepages-1Gi: "0"
      hugepages-2Mi: "0"
      memory: 16265856Ki
      pods: "110"
    allocated:
      cpu: 950m
      memory: 290Mi
      pods: "11"

But the ResourceSummary is not precise enough, it mechanically counts the resources on all nodes, but ignores the fragment resources.(For example, a cluster with 2000 node, 1 core cpu left on each node, from the ResourceSummary, we get there are 2000 core CPU left for the cluster, that's not correct.)

References

LFX Mentorship: https://github.com/cncf/mentoring/tree/main/lfx-mentorship
timeline: https://github.com/cncf/mentoring/tree/main/lfx-mentorship/2022/02-Summer#timeline

Sep 28 '21 04:09 RainbowMango

Hello @RainbowMango , hope you're doing well , this is Anutosh here from India . I'm an open source enthusiast and I'm currently involved with communities based on numerical and symbolic computations/algorithms in math and physics like numpy, sympy, networkx.

I am keen to take part in the LFX Mentorship program for the summer term and this project interests me. But being new to the project , I would be glad if you could suggest any relevant resources/links I should be going through as a beginner for getting to know the project and the library better . Thank you !

May 13 '22 02:05 anutosh491

Hello @RainbowMango , hope you're doing well , this is Anutosh here from India . I'm an open source enthusiast and I'm currently involved with communities based on numerical and symbolic computations/algorithms in math and physics like numpy, sympy, networkx.

I am keen to take part in the LFX Mentorship program for the summer term and this project interests me. But being new to the project , I would be glad if you could suggest any relevant resources/links I should be going through as a beginner for getting to know the project and the library better . Thank you !

Hey @RainbowMango Any Update on this comment, I would like to volunteer to work on this issue under LFX Mentorship Program? Happy to send you a proposal about the same, Thanks!!

May 15 '22 09:05 AALEKH

@AALEKH @anutosh491 Thanks for reaching us. This task requires some basic knowledge about Kubernetes, and after that, you can get started with Karmada quick start. Here is some documents too that might be helpful to understand the project.

May 19 '22 15:05 RainbowMango

@AALEKH @anutosh491 Thanks for reaching us. This task requires some basic knowledge about Kubernetes, and after that, you can get started with Karmada quick start. Here is some documents too that might be helpful to understand the project.

Thank you @RainbowMango I have been learning more about karmada, the problem it solves and the functionality behind it . I will be going through the docs soon and then start working on my application !

EDIT1: I went through most of the resources shared above . I am now much more comfortable with Karmada and have a better understanding of how Karmada operates. Thanks for the resources : )

May 19 '22 15:05 anutosh491

resourceModel:
  - grade:
      cpu: "1"
      memory: 2Gi
    count: 10
  - grade:
      cpu: "2"
      memory: 4Gi
    count: 6

Hello @RainbowMango , I've been framing my cover letter for the LFX mentorship program and had couple doubts regarding this proposed model.

What information does grade and count convey ? I realize that grade would be type corev1.ResourceList and would be carrying pairs of resources and quantity !
resourceModel would also be introduced in the Cluster API only right ?
Also could you elaborate a bit more on what all fragmented resources you're talking about in this line ?

But the ResourceSummary is not precise enough, it mechanically counts the resources on all nodes, but ignores the fragment resources

Also is there any other file /code chunk in any file you would like me to go through ? I've gone through some files completely like pkg/apis/cluster.types.go , pkg/scheduler/core/generic_scheduler.go which helped me in general gain more idea about the project ! ( I plan to go through the failover/ rescheduling algorithms code i.e. division.go and the other one sometime soon)

May 27 '22 02:05 anutosh491

Also does this call for removing or rather deprecating the Resource Summary class and used objects throughout the codebase ?

May 27 '22 12:05 anutosh491

Hello @RainbowMango sir , could you please help me with the doubts I've asked above as today is the last day to apply ! I am actually ready with my application material but just want to confirm these basic doubts before turning in my application . Thanks in advance !

May 28 '22 08:05 anutosh491

What information does grade and count convey ? I realize that grade would be type corev1.ResourceList and would be carrying pairs of resources and quantity !

The grade and count on the issue are examples of what kind of things we are trying to build.

resourceModel would also be introduced in the Cluster API only right?

Probably. Given the Cluster object is a large object, another option may be to build a separated API to store the module. Where and how to store the module is not the key concern here, more important is how to describe a cluster's resource situation by the model.

Also could you elaborate a bit more on what all fragmented resources you're talking about in this line ?

Please see the example:

For example, a cluster with 2000 node, 1 core cpu left on each node, from the ResourceSummary, we get there are 2000 core CPU left for the cluster, that's not correct.

Also does this call for removing or rather deprecating the Resource Summary class and used objects throughout the codebase ?

Probably.

May 28 '22 08:05 RainbowMango

@halfrost, please assign this issue to you by command: /assign @halfrost to show you are working on this.

Jul 14 '22 04:07 RainbowMango

@RainbowMango: GitHub didn't allow me to assign the following users: halfrost.

Note that only karmada-io members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to this:

@halfrost, please assign this issue to you by command: /assign @halfrost to show you are working on this.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jul 14 '22 04:07 karmada-bot

/assign @halfrost

Jul 14 '22 05:07 halfrost

@halfrost Could you please post the API design here? I know some guys who are interested in it.

Jul 26 '22 03:07 RainbowMango

OK, I will write a document about the detail of API design.

Jul 26 '22 04:07 halfrost

/close in favor of #2379

Aug 16 '22 13:08 RainbowMango

@RainbowMango: Closing this issue.

In response to this:

/close in favor of #2379

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Aug 16 '22 13:08 karmada-bot