katib icon indicating copy to clipboard operation
katib copied to clipboard

A metrics collector for Kubeflow Pipeline Metrics artifacts

Open votti opened this issue 3 years ago • 13 comments
trafficstars

/kind feature

Describe the solution you'd like Currently a aim is to do parameter tuning over pipelines in katib (#1914, #1993).

Kubeflow pipelines allow for dedicated metrics artifacts: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html?h=metrics#kfp.dsl.Metrics https://www.kubeflow.org/docs/components/pipelines/v1/sdk/pipelines-metrics/

Having a dedicated Katib sidecar metrics collector that collects the metrics from this artifacts, would make pipelines and katib work together quite nicely.

The current workaround is to use the stdout collector, but this causes issues with the complex commands in pipeline components (#1914, will add dedicated issue soon).

Anything else you would like to add:


Love this feature? Give it a 👍 We prioritize the features with the most 👍

votti avatar Nov 17 '22 20:11 votti

I think I may give this a go - I would try to build this in Python analogous to the tfevent-metricscollector. Does this sound like a reasonable approach? I am also happy for any other suggestion.

votti avatar Nov 19 '22 02:11 votti

Small update: I have now a metrics collector for kubeflow v1 pipelines that I think should work and according to the logs already manages to caputre the pipeline metrics artifacts(modeled after tfevent-metricscollector).

What I am failing is to pass the current trial name to the custom connector in the metricsCollectorSpec Essentially I am using the a very similar configuration as in the custom connector example here: https://github.com/kubeflow/katib/blob/master/examples/v1beta1/metrics-collector/custom-metrics-collector.yaml#L13-L35

My cli metricscollector takes an argument "-t" or "--trial_name" with the trial name to use for reporting (exactly as the tfevent-metricscollector). Would maybe someone know a hint how to configure this such that the current trial-name would be passed as arg?

votti avatar Feb 09 '23 21:02 votti

I am now really a bit confused: Reading the source code of the metrics collector sidecar injection inject_webhook, it looks to me as if the trial name should be actually added to the args: https://github.com/kubeflow/katib/blob/22b740802a06d8926255b204076837d6e344ebb9/pkg/webhook/v1beta1/pod/inject_webhook.go#L302

Yet looking at the pods Katib creates, all these arguments seem to be missing. Is there anything I do not see?

My current section to specify the metrics collector:

  metricsCollectorSpec:
    source:
      fileSystemPath:
        path: "/tmp/outputs/mlpipeline_metrics/data"
        kind: File
    collector:
      customCollector:
        image: votti/kfpv1-metricscollector:v0.0.7
        imagePullPolicy: Always
        name: custom-metrics-logger-and-collector
      kind: Custom

Which creates a specification as:

 - image: votti/kfpv1-metricscollector:v0.0.7
   imagePullPolicy: Always
   name: custom-metrics-logger-and-collector
   resources: {}
   terminationMessagePath: /dev/termination-log
   terminationMessagePolicy: File
   volumeMounts:
   - mountPath: /tmp/outputs/mlpipeline_metrics
     name: metrics-volume
   - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
     name: kube-api-access-rnmkw
     readOnly: true

votti avatar Feb 10 '23 08:02 votti

Thank you for working on this @votti! Would it be easier to use push-based metrics collector for such use-cases (ref: https://github.com/kubeflow/katib/issues/577)? Then we don't even need a sidecar to collect metrics.

cc @johnugeorge @gaocegege @tenzen-y

andreyvelich avatar Feb 10 '23 12:02 andreyvelich

I now managed to implement a working metrics collector for Kubeflow Pipeline V1 Metrics artifacts: https://github.com/d-one/katib/tree/feature/kfpv1-metricscollector/cmd/metricscollector/v1beta1/kfpv1-metricscollector

For a full example how this is used see: https://github.com/votti/katib-exploration/blob/main/notebooks/mnist_pipeline_v1.ipynb

@Push: I think it is an interesting idea to build a dedicated KubeflowPipeline component that can push metrics to Katib. Challenges I see here is how to pass the current trial_name. Otherwise the component could be built quite similar to the kfpv1-metricscollector.

votti avatar Feb 10 '23 17:02 votti

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Aug 24 '23 10:08 github-actions[bot]

Hello, any update for KFP v2?
Cheers!

AlexandreBrown avatar Aug 24 '23 10:08 AlexandreBrown

@AlexandreBrown We've worked on Katib + KFP example in this PR: https://github.com/kubeflow/katib/pull/2118 Any help and review for this PR are appreciated!

andreyvelich avatar Aug 24 '23 12:08 andreyvelich

@AlexandreBrown We've worked on Katib + KFP example in this PR: https://github.com/kubeflow/katib/pull/2118 Any help and review for this PR are appreciated!

Great to see progress, was this PR made for kfp v2 or only v1?

AlexandreBrown avatar Aug 29 '23 01:08 AlexandreBrown

Great to see progress, was this PR made for kfp v2 or only v1?

That PR is only for v1.

tenzen-y avatar Aug 29 '23 05:08 tenzen-y

@AlexandreBrown This is based on V1 as I only managed to compile the pipeline in KFP V1 as an Argo Workflow manifest. If there is a way to export KFP V2 as Argo workflow it should be straightforward to use V2 as well.

votti avatar Aug 29 '23 06:08 votti

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Nov 27 '23 10:11 github-actions[bot]

/lifecycle frozen

tenzen-y avatar Nov 27 '23 11:11 tenzen-y