training-operator [SDK] Add more unit tests for TrainingClient APIs

We need to add more unit tests for TrainingClient APIs by updating the training_client_test.py. That should help us to detect bugs like this one: https://github.com/kubeflow/training-operator/pull/2160. Let's use this issue to track unit tests for various public APIs.

Feel free to submit PR to add unit tests for one of the following APIs, just comment on this issue on which unit tests you are working on:

[x] create_job
[x] wait_for_job_conditions https://github.com/kubeflow/training-operator/pull/2196
[x] get_job https://github.com/kubeflow/training-operator/pull/2205
[ ] list_jobs
[ ] get_job_conditions https://github.com/kubeflow/training-operator/pull/2253
[ ] is_job_created https://github.com/kubeflow/training-operator/pull/2253
[ ] is_job_running https://github.com/kubeflow/training-operator/pull/2253
[ ] is_job_restarting https://github.com/kubeflow/training-operator/pull/2253
[ ] is_job_succeeded https://github.com/kubeflow/training-operator/pull/2253
[ ] is_job_failed https://github.com/kubeflow/training-operator/pull/2253
[x] get_job_pods https://github.com/kubeflow/training-operator/pull/2175
[x] get_job_pod_names https://github.com/kubeflow/training-operator/pull/2192
[ ] get_job_logs
[x] update_job https://github.com/kubeflow/training-operator/pull/2192
[x] delete_job https://github.com/kubeflow/training-operator/pull/2232

cc @kubeflow/wg-training-leads @deepanker13 @Electronic-Waste @tariq-hasan

/good-first-issue /area sdk /area testing

Jul 09 '24 11:07 andreyvelich

@andreyvelich: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to this:

We need to add more unit tests for TrainingClient APIs by updating the training_client_test.py. That should help us to detect bugs like this one: https://github.com/kubeflow/training-operator/pull/2160. Let's use this issue to track unit tests for various public APIs.

Feel free to submit PR to add unit tests for one of the following APIs, just comment on this issue on which unit tests you are working on:

[x] create_job

[ ] wait_for_job_conditions

[ ] get_job

[ ] list_jobs

[ ] get_job_conditions

[ ] is_job_created

[ ] is_job_running

[ ] is_job_restarting

[ ] is_job_succeeded

[ ] is_job_failed

[ ] get_job_pods

[ ] get_job_pod_names

[ ] get_job_logs

[ ] update_job

[ ] delete_job

cc @kubeflow/wg-training-leads @deepanker13 @Electronic-Waste @tariq-hasan

/good-first-issue /area sdk /area testing

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jul 09 '24 11:07 google-oss-prow[bot]

Hello @andreyvelich I am working on writing unit test for get_job, so should I seperate the test_data both for create_job and get_job individually or combine them into single test_data by adding one more identifier to each sample which tells for which function ( create_job or get_job) the test is for?

Example test data -

( "create_job_valid_flow_to_create_job_using_image", "create_job", { "name": "test-job", "namespace": "test", "base_image": "docker.io/test-training", "num_workers": 2, }, "success", )

( "get_job_valid", "get_job", {"namespace": "test", "job_name": "valid_job"}, {"mock": "job_response"} ),

Jul 10 '24 04:07 ayushrakesh

Thank you for your contribution @ayushrakesh!

I think it is easier to separate tests data for create_job and get_job API, so it will be easier for us to maintain those lists.

WDYT @droctothorpe @deepanker13 @kubeflow/wg-training-leads ?

Jul 11 '24 17:07 andreyvelich

Thank you for your contribution @ayushrakesh!

I think it is easier to separate tests data for create_job and get_job API, so it will be easier for us to maintain those lists.

WDYT @droctothorpe @deepanker13 @kubeflow/wg-training-leads ?

As my understanding, this is unit tests, not integration tests. So, dedicated test data every for functions would be better.

Jul 11 '24 18:07 tenzen-y

In general, if you can recycle fixtures to reduce redundancy, that's usually advisable; whether or not that's possible depends on the expected IO of the two functions; create_jobs doesn't return anything (though maybe it should), so I'm not sure if it makes sense to consolidate the fixtures in this context.

Jul 11 '24 19:07 droctothorpe

@andreyvelich Thank you for guiding me in contributing to Training-Operator and Katib.

I'll write some test cases for training_client_test.py as soon as I understand the whole logics and the source code of Training-Operator.

Jul 12 '24 07:07 Electronic-Waste

Thank you @Electronic-Waste. So, maybe @ayushrakesh can create unit test for get_job API and @Electronic-Waste can work on wait_for_job_conditions API.

Jul 12 '24 14:07 andreyvelich

I'm interested in working on the unit test for get_job_pods, any objection?

Jul 14 '24 12:07 YosiElias

Sure, thank you for your time @YosiElias! /assign @ayushrakesh @Electronic-Waste @YosiElias

Jul 15 '24 13:07 andreyvelich

Having a different list will make code more readable, Also should we create a folder and have a test file for each function to make it more organised? @andreyvelich

Jul 16 '24 09:07 deepanker13

Having a different list will make code more readable, Also should we create a folder and have a test file for each function to make it more organised? @andreyvelich

@deepanker13 Usually, test file have the same name as the actual file with _test suffix, E.g. you can check the KFP client example: https://github.com/kubeflow/pipelines/tree/master/sdk/python/kfp/client.

Jul 16 '24 10:07 andreyvelich

Will the number of test cases be too many to keep in a single training_client_test module?

I was thinking of moving test data, fixtures and utility functions from the katib_client_test module, for example, and putting these in newly created test_data, conftest and utils modules.

That way, I was thinking that the code may be easier to maintain in the katib_client_test module.

And katib_client_test and its associated test_data, conftest and utils modules would be moved to a newly created sdk/python/v1beta1/kubeflow/katib/tests folder.

Please let me know your thoughts.

Reference: https://github.com/kubeflow/katib/compare/master...tariq-hasan:katib:add-unit-test-for-katib-client-tune

Jul 17 '24 02:07 vector-flow

@tariq-hasan I think, we discussed before with @droctothorpe that we want to keep _test file alongside with actual files: https://github.com/kubeflow/katib/issues/2184#issuecomment-1660475384. I understand that this cause file to grow, but we are already doing the same for go unit tests. For example: https://github.com/kubeflow/katib/blob/master/pkg/webhook/v1beta1/experiment/validator/validator_test.go.

I feel that the test data should be associated with function that we test: get_job, create_job, train, etc. KFP follows the same way: https://github.com/kubeflow/pipelines/tree/master/sdk/python/kfp/client.

Any concerns you see with that @tariq-hasan ?

Jul 17 '24 13:07 andreyvelich

That sounds good. Thanks for the clarification.

Jul 18 '24 05:07 vector-flow

I'm interested in working on the unit test for get_job_pod_names and update_job, any objection?

Jul 30 '24 09:07 YosiElias

Absolutely, thank you @YosiElias!

Jul 30 '24 14:07 andreyvelich

@andreyvelich UTs for wait_for_job_conditions have been added now! PTAL👀 when you are available.

Also, can I work on get_job_conditions and related functions whose names begin with is_job_?

Aug 06 '24 15:08 Electronic-Waste

Hey @andreyvelich,

I can take a look at get_job and delete_job

Aug 08 '24 13:08 Bobbins228

Hi @andreyvelich, I am interested in contributing to training-operator. Can I take list_jobs? Thank you.

Sep 18 '24 19:09 seanlaii

Hi @andreyvelich, I am interested in contributing to training-operator. Can I take list_jobs? Thank you.

Feel free to take the task.

Sep 18 '24 19:09 tenzen-y

Hi, is anyone working on get_job_logs? If not, I can also work on it.

Sep 28 '24 01:09 seanlaii

@seanlaii Sure, please feel free to take unit test for get_job_logs API!

Sep 30 '24 21:09 andreyvelich

Right now, we have unit test for every single public API of TrainingClient() 🎉 Thanks everyone for your time and work! @deepanker13 @Electronic-Waste @Bobbins228 @seanlaii @YosiElias

Oct 14 '24 13:10 andreyvelich