kfctl icon indicating copy to clipboard operation
kfctl copied to clipboard

kfctl_existing postsubmit is failing

Open jlewi opened this issue 6 years ago • 11 comments
trafficstars

Here's the postusbmit test grid https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-postsubmit&group-by-target=&group-by-hierarchy-pattern=%5B%5Cw-%5D%2B

Lots of red. Here's a failed run. https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/kubeflow_kubeflow/kubeflow-postsubmit/1186679529897201664

jlewi avatar Oct 22 '19 21:10 jlewi

@yanniszark any update on this?

jlewi avatar Oct 29 '19 17:10 jlewi

Here's a recent run https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/kubeflow-periodic-master/1189264851759796234

http://testing-argo.kubeflow.org/workflows/kubeflow-test-infra/kubeflow-periodic-master-kfctl-go-existing-v07-6234-0336?tab=workflow

Here are the logs from the test

kubeflow-periodic-master-kfctl-go-existing-v07-6234-0336-3078399875.log.txt

This looks like a problem with the test. The cluster creation script is failing because the cluster already exists.

Since it looks like the problem is the test; I'm going to say that right now this is not release blocking.

jlewi avatar Oct 30 '19 01:10 jlewi

@yanniszark Do you agree that right now this isn't release blocking given we don't have signal indicating that Kubeflow is broken and not the test?

jlewi avatar Oct 30 '19 01:10 jlewi

I thought there was another issue for this where I commented. The Kubeflow installation seems to be working fine installing by hand. I would like to know the reason why they're failing though, there may be a bug hidden in there. I will update this issue with my findings today.

yanniszark avatar Oct 30 '19 13:10 yanniszark

@jlewi my findings so far:

  1. The initial installation fails because of this code: https://github.com/kubeflow/kubeflow/blob/7f64d8b023147927b74139bbdbbffa1ffca536bc/py/kubeflow/kubeflow/ci/kfctl_go_test_utils.py#L261

The v0.7 existing_arrikto config doesn't use a Plugin, so the test fails with KeyError.

  1. After fixing that, I get timeouts waiting for the minio deployment. Not sure what the issue is yet, could be just not waiting for enough time.

yanniszark avatar Oct 31 '19 11:10 yanniszark

@yanniszark Any update on this?

jlewi avatar Nov 25 '19 19:11 jlewi

@yanniszark any update?

jlewi avatar Dec 18 '19 00:12 jlewi

@jlewi thanks for the ping. I haven't had much cycles to put into this, I will try to allocate some.

yanniszark avatar Dec 19 '19 18:12 yanniszark

@yanniszark any chance you will be able to work on this in the coming weeks? It would be great to have a working test in advance of the 1.0 release.

jlewi avatar Jan 06 '20 14:01 jlewi

Any chance of an update on this? It would be great to have working tests for the 1.1 release. Thanks

crobby avatar Jun 16 '20 14:06 crobby

Issue-Label Bot is automatically applying the labels:

Label Probability
area/testing 0.76

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

issue-label-bot[bot] avatar Jun 16 '20 14:06 issue-label-bot[bot]