kubeflow-introduction icon indicating copy to clipboard operation
kubeflow-introduction copied to clipboard

ks apply cloud -c train - does not work

Open majacal opened this issue 6 years ago • 6 comments

matthias in ~/projekt/ml/kubeflow-introduction-master/ksonnet-kubeflow λ ks apply cloud -c train

ERROR handle object: patching object from cluster: merging object with existing state: unable to recognize "/var/folders/v1/zhyqh9cd4zvdb5vpz58f70n00000gn/T/ksonnet-mergepatch105291615": no matches for kind "Tfjob" in version "kubeflow.org/v1alpha1"

i tried to rename TFJob to Tfjob

but it doesnt work .... maybe an old version ?

VERSION=v0.2.0-rc.1
ks registry add kubeflow github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflow
ks pkg install kubeflow/core@${VERSION}
ks pkg install kubeflow/tf-serving@${VERSION}
ks pkg install kubeflow/tf-job@${VERSION}

i tried:

VERSION=v0.3.4

but it cannot fetch the kubeflow/tf-jobs repo ...

majacal avatar Dec 11 '18 22:12 majacal

In 0.3 the TFJob prototypes are no in the examples package https://github.com/kubeflow/kubeflow/tree/v0.3-branch/kubeflow/examples/prototypes

If you use kfctl to install Kubeflow and create your ksonnet application the correct packages should be installed.

jlewi avatar Dec 29 '18 21:12 jlewi

Click-to-deploy or kfctl will get you a full installation of kubeflow, but I'm not sure what other problems you will run into trying to run the codelab on a version other than v0.2.0-rc.1.

The lab needs to be updated. It's on my list to work on in the new year and I'm happy to work with you @MatthiasHertel if you'd like to help :) Otherwise, stay tuned! cc @DanSanche

texasmichelle avatar Dec 30 '18 03:12 texasmichelle

Same problems here. Did you solve it?

AletOne avatar Jan 22 '19 02:01 AletOne

A new version of this codelab will be released shortly. I'll update here when it's ready

In the mean time, this tutorial should help you get started

daniel-sanche avatar Jan 22 '19 19:01 daniel-sanche

@MatthiasHertel ran into the same issue and was able to work around with the following:

  • Descend into components directory and retrieve corrected tf-job-simple.jsonnet file:
cd components
wget https://raw.githubusercontent.com/kubeflow/kubeflow/v0.3-branch/kubeflow/examples/prototypes/tf-job-simple.jsonnet
  • Insert params capability into said file:
sed -i '1ilocal env = std.extVar("__ksonnet/environments");\nlocal params = std.extVar("__ksonnet/params").components.train;' tf-job-simple.jsonnet
  • Update params of component according to original code lab:
ks param set tf-job-simple image $TRAIN_PATH
ks param set tf-job-simple name "train-"$VERSION_TAG
  • And finally apply to cluster:
ks apply cloud -c tf-job-simple
  • Which yields the following:
INFO Applying tfjobs default.train-1549194635     
INFO Creating non-existent tfjobs default.train-1549194635

@DanSanche @texasmichelle happy to issue a PR if you want to point me to the correct repo :+1:

ggodreau avatar Feb 03 '19 16:02 ggodreau

Hey @ggodreau, thanks for looking into it, but we actually just published the updated version of the codelab on Friday. This repo will be deprecated as the resources are now pulled from kubeflow/examples

Hope this solves the issue for everyone

daniel-sanche avatar Feb 03 '19 18:02 daniel-sanche