pipelines
pipelines copied to clipboard
feat(backend): configurable log level for driver / launcher images
Description of your changes:
This PR adds the ability to change log level in the driver / launcher containers. This is implemented in a similar pattern as the overrides for driver / launcher images. Specifically, you can add the following environment variables to the ml-pipeline deployments:
spec:
containers:
- env:
- name: DRIVER_LOG_LEVEL
value: "3"
- name: LAUNCHER_LOG_LEVEL
value: "3"
Note: A numerical value such as the literal 3 not "3" here will be invalid deployment spec and validation on the spec will fail causing kubectl edit to reject it with the message: error: deployments.apps "ml-pipeline" is invalid.
Other minor alterations
- In this commit two locations were updated to use the
workflowCompiler.driverImageandworkflowCompiler.launcherImageattributes which are populated here. This is a very minor change but seemed better to invoke only once and match other such usages (in importer.go and dag.go). If there are reasons this should be re-invoked, please let me know. - The --copy flag were moved into the arguments block, to match other implementations. Again, lmk if this is not wanted.
Feedback wanted
The environment variable for this is similar to the V2_LAUNCHER_IMAGE and V2_DRIVER_IMAGE but without the V2_ prefix. If anyone has preferences here, I do not, so happy to take any path.
Checklist:
- [x] You have signed off your commits
- [x] The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.
Hi @CarterFendley. Thanks for your PR.
I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/ok-to-test
Maybe worth adding just one unit test to verify if setting both env vars will generate the Workflow yaml with the new flags.
Maybe worth adding just one unit test to verify if setting both env vars will generate the Workflow yaml with the new flags.
Will do :)
Okay in this commit I have updated the compiler tests with logic to optional take in environment variables and set them:
if tt.envVars != nil {
for _, envVar := range tt.envVars {
parts := strings.Split(strings.ReplaceAll(envVar, " ", ""), "=")
os.Setenv(parts[0], parts[1])
// Unset after test cases has ended
defer os.Unsetenv(parts[0])
}
}
To test cases and golden files have been added to test the logic included in this PR.
{
jobPath: "../testdata/hello_world.json",
platformSpecPath: "",
argoYAMLPath: "testdata/with_logging/hello_world.yaml",
envVars: []string{"DRIVER_LOG_LEVEL=5", "LAUNCHER_LOG_LEVEL=5"},
},
{
jobPath: "../testdata/importer.json",
platformSpecPath: "",
argoYAMLPath: "testdata/with_logging/importer.yaml",
envVars: []string{"DRIVER_LOG_LEVEL=5", "LAUNCHER_LOG_LEVEL=5"},
},
/lgtm
I wanted to install this and run it, and also verify that it's backwards compatible. All looks good with that, but I noticed one problem. You missed one of the launcher invocations, roughly here
https://github.com/kubeflow/pipelines/blob/c2a77134265dbbe7ca8492e6e429fd3cc60b8419/backend/src/v2/driver/driver.go#L389-L405
The symptom is that even though I set LAUNCHER_LOG_LEVEL to something other than 1, the user code (impl) container logs always say Setting log level to: '1'
Launcher is invoked twice, in two different ways :facepalm:.
- First it's invoked to copy itself into the main container (you caught that invocation -- the one with
--copy-- this runs in thekfp-launcherinit-container) - Second, to actually launch the user's code (this runs in the
maincontainer). This second one is the one that is much more important to be able to control the logging in. That invocation is generated in the pod spec patch in driver.go. (This is all super arcane, so let me know if this isn't making sense.)
Once you add it, you should see it in the output of podSpecPatch in a container-driver log. Roughly:
I1018 20:34:19.156021 18 main.go:246] output podSpecPatch=
{"containers":[{"name":"main","image":"python:3.7","command":["/kfp-launcher/launch","--pipeline_name","my-pipeline","--run_id","SNIP","--execution_id","291","--executor_input","SNIP","--component_spec","SNIP","--pod_name","$(KFP_POD_NAME)","--pod_uid","$(KFP_POD_UID)","--mlmd_server_address","$(METADATA_GRPC_SERVICE_HOST)","--mlmd_server_port","$(METADATA_GRPC_SERVICE_PORT)","--",
**(you'll see your log_level here...)**
],
And then the user code container log should respect the launcher log setting:
Setting log level to: 'whatever_you_set'
@CarterFendley is there anything pending in this PR?
@hbelmiro Yes, as noted by Greg, we want to make sure that the main container for the -impl pod also is setup to be configurable.
Had some discussion on slack about the possible implementations. I am OOO probably all of next week, will add that feature soon after I am back
@CarterFendley any updates on this?
@hbelmiro Thanks for the ping, I'll slack you real quick.
@CarterFendley , just a bump, can this be rebased, and could you provide update on status?
Thanks for the bump, will provide update soon. Apologies for the delay.
On Fri, Jan 10, 2025 at 11:47 Humair Khan @.***> wrote:
@CarterFendley https://github.com/CarterFendley , just a bump, can this be rebased, and could you provide update on status?
— Reply to this email directly, view it on GitHub https://github.com/kubeflow/pipelines/pull/11278#issuecomment-2583235631, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEBVPQXAVIAWMFE2JG7CDT2J72TJAVCNFSM6AAAAABPS45HAOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBTGIZTKNRTGE . You are receiving this because you were mentioned.Message ID: @.***>
Modifications have been made to address this issue found by @gregsheremeta, thanks for pointing that out! Since the instance was one where the driver sets the log level of the launcher a design change was made to have one unified PIPELINE_LOG_LEVEL env var to prevent the somewhat confusing implementations of passing the LAUNCHER_LOG_LEVEL as a command line argument to the driver (or similar implementations).
The new usage is to update the environment variable on the ml-pipelines deployment to the following.
spec:
containers:
- env:
- name: PIPELINE_LOG_LEVEL
value: "3"
Importantly, as before, it is important that a numerical value such as the literal 3 not "3" here will be invalid deployment spec and validation on the spec will fail causing kubectl edit to reject it with the message: error: deployments.apps "ml-pipeline" is invalid.
After these updates, the main container now also runs the launcher with configured log level.
@hbelmiro @HumairAK, or any others: Please let me know if you have any additional feedback on this PR. Apologies for the delay in the patch!
/lgtm /approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: droctothorpe, HumairAK
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~backend/OWNERS~~ [HumairAK]
- ~~manifests/kustomize/OWNERS~~ [HumairAK]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment