volcano test: add ut for performance test

This MR aims to add a simple performance UT for Volcano.

Two reasons for supplementing UT:

There are some questions about performance data, such as https://github.com/volcano-sh/volcano/issues/3203.
Performance-oriented UT helps Volcano make optimizations.

Run this case using the following commands:

cd pkg/scheduler
go test -v --run=TestRunOnc

I have run this case under default configuration without optimization in a machine (56C 256G), and get this result：

I also added a "latencyrecorder" locally to track the execution time of various plugins through instrumentation. However, it's not elegant because it is implemented through code instrumentation, so I haven't submitted it yet.

Can anyone take a look first and give some suggestions? thx~

Dec 11 '23 12:12 bysph

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign thor-wl You can assign the PR to them by writing /assign @thor-wl in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/scheduler/OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Dec 11 '23 12:12 volcano-sh-bot

@lowang-bh Could you please spare some time to review my code? thx~

Dec 11 '23 13:12 bysph

Thanks for your hard works. Can it cover all those schedule logic in the real scheduler? My idea is to use k8s fakeclient to submit jobs/nodes/pods and then triger the scheduler run.

Dec 11 '23 13:12 lowang-bh

I think it covers most of the schedule logic, except for the part that calls the k8s API(If there are any errors in my statement, please correct me, as I'm still new to volcano.). According to my understanding, the cache is used as middle layer between k8s and Volcano scheduling logic, so I chose to add a mock cache.

The reason I don't directly construct a real cache through a fake kube client is because certain parts of the cache implementation use kubeconfig to create a new client (such as vcclient). These scenarios are difficult to use a fake kube client. Do you have any good ideas on how to handle this?

Dec 12 '23 02:12 bysph

Yes, I have some plan to do it. I will write a demo and then we can work together on that feature.

I think we can keep this pr also. @wangyang0616 @Monokaix

Dec 12 '23 03:12 lowang-bh

Thanks~ hope I can help

Dec 12 '23 06:12 bysph

I think there are two process that we should concern, the first one is schedule logic in every session, this mainly consists of predict and nodeOrder plugin, the second one is the kube-apiserver throughtput in real cluster, so your test can only test the first part and in my opion, we have no necessary to mock a cache becauese cache is composed of many interfaces, we just need mock these interfaces such as kube client : )

Also, kube-apiserver throughput has a great influence on scheduler performance, so we abstract a bind interface to enable user register their own bind method like bind a lots of pods in one bind request, so you can also work on this to achieve a higher performance.

Dec 14 '23 01:12 Monokaix

@Monokaix Thanks for your review~ You are absolutely right that this unit test does not cover the batch binding part, and indeed, the implementation of the mock kubeclient unit test should be able to fully replace it.

However, with this unit test, Volcano users can now assess the performance of the scheduler's core module and carry out optimization work. If the implementation of the mock kubeclient unit test cannot be completed in the short term (which might involve some code modifications or refactoring), would you be open to considering this unit test in the interim?

Furthermore, the bottleneck in the second part is not only in the binding but also in the performance of Kubernetes, so it may not be entirely solvable with unit testing but might require end-to-end testing.

Dec 14 '23 03:12 bysph

Yaeh, you can work on this ut and you are welcome, but can we not mock the whole cache and just use fake interface when we need mock. And if this feasible, you can have a try, if it's a little hard, you can try to wrap a Cache and overwrite the methods that hard to mock : )

Dec 14 '23 06:12 Monokaix

OK,I will fix that later.

Dec 14 '23 10:12 bysph

Hi, @bysph , I mocked the SchedulerCache in PR add mock cache for ut and performance test #3269. You can continue your work on this feature, maybe.

Dec 17 '23 12:12 lowang-bh

Okay, I will submit a UT based on this. Should I wait for the code to be merged before submitting?

Dec 19 '23 02:12 bysph

Yes，you can try it after https://github.com/volcano-sh/volcano/pull/3269 merged.

Jan 04 '24 11:01 Monokaix

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Mar 17 '24 09:03 stale[bot]