manifests icon indicating copy to clipboard operation
manifests copied to clipboard

Distributions and Kubeflow 1.6 release

Open annajung opened this issue 2 years ago • 43 comments

The goal of this issue is to track the progress of distributions alongside the 1.6 release

While we hope all distros will manage to be ready when the KF 1.6 release is out, this is sometimes impossible to achieve. In this issue, we want to both keep track of the progress of distributions toward the KF 1.6 release and also which of the distros will be working on KF 1.6 (testing during the distribution testing cycle) even if they can't meet the KF 1.6 deadline.

Tagging distribution owners identified in the https://github.com/kubeflow/community/pull/560 (Any new or missed distro owners, please comment on the issue to be tracked with the 1.6 release)

Distribution Representatives State
Arrikto EKF @kimwnasptd (stretch goal) participating in 1.6
Arrikto MiniKF @kimwnasptd (stretch goal) participating in 1.6
AWS @surajkota helping with testing in 1.6
Charmed Kubeflow @DomFleischmann participating in 1.6
Google Cloud @zijianjoy @gkcalat participating in 1.6
IBM @yhwang participating in 1.6
Nutanix @johnugeorge participating in 1.6
Kubeflow with Argo CD @DavidSpek
Openshift @VaishnaviHire @LaVLaS participating in 1.6
Oracle Cloud Infrastructure @julioo participating in 1.6

Please let us know if you'll be participating in the 1.6 release by answering the following questions:

  • Are you planning on having your distro ready in sync with the KF 1.6 release?
  • Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?
  • If you cannot participate, when can the community expect your distro to be ready for release 1.6?

[Update on June 14th] Distribution testing is scheduled to take place from July 20th to August 10th ~Note: After the 2 weeks delay, distribution testing is now scheduled to take place from July 6th to July 27th (ref https://github.com/kubeflow/community/pull/561)~

cc @kubeflow/release-team @jbottum

annajung avatar Jun 13 '22 18:06 annajung

For IBM IKS,

  • Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

  • Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

yhwang avatar Jun 13 '22 19:06 yhwang

For Nutanix Karbon,

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

johnugeorge avatar Jun 13 '22 19:06 johnugeorge

For AWS,

Are you planning on having your distro ready in sync with the KF 1.6 release?

TBD. If not in sync, we will follow up

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

surajkota avatar Jun 14 '22 00:06 surajkota

For Canonical's Charmed Kubeflow

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

DomFleischmann avatar Jun 14 '22 09:06 DomFleischmann

Hi distribution owners! After checking with all WGs, the release team has decided to extend the all release deadline by 2 more weeks.

Email announcement: https://groups.google.com/g/kubeflow-discuss/c/I4l97HvrGEA/m/227aCe_mCgAJ New schedule PR: https://github.com/kubeflow/community/pull/562

Distribution testing is now scheduled to take place from July 20th to August 10th

annajung avatar Jun 14 '22 19:06 annajung

For OpenShift,

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

LaVLaS avatar Jun 22 '22 04:06 LaVLaS

For Google Cloud

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

zijianjoy avatar Jun 22 '22 05:06 zijianjoy

For Oracle Cloud Infrastructure

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

julioo avatar Jun 27 '22 12:06 julioo

cc @gkcalat for working on Kubeflow on Google Cloud release.

zijianjoy avatar Jun 27 '22 16:06 zijianjoy

A little bit late to the party, but tfr Arrikto EKF, MiniKF

Are you planning on having your distro ready in sync with the KF 1.6 release?

It will be a stretch, but this will be our goal.

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

kimwnasptd avatar Jun 29 '22 20:06 kimwnasptd

Also heads up to everyone for the following items from Notebooks and Manifests WG:

  1. Status with K8s 1.22 and Notebooks https://github.com/kubeflow/manifests/issues/2199#issuecomment-1170457965
  2. We are targeting to use Istio 1.14, instead of 1.13 https://github.com/kubeflow/manifests/issues/2200#issuecomment-1170381632
  3. We are targeting on Knative 1.4 https://github.com/kubeflow/manifests/issues/2207#issuecomment-1163353597

We'll also be on the lookout during the feature freeze for any bug that could occur from any of the above updates, but we are confident there won't be any major issues. But of course don't hesitate to report and ping is you bump into anything undexpected!

kimwnasptd avatar Jun 29 '22 20:06 kimwnasptd

Hi Distribution owners, sorry for the delay in providing you with a new RC to test with.

There was a bug identified for Notebooks WG and they're currently working on providing the release team with a new release to be used to cut a new 1.6 RC.

We hope to have the 1.6 RC1 that contains the fix for the bug identified available for you soon. Once the new RC is available, I'll leave an update here and send out an announcement to kubeflow-discuss.

If you want to get started with testing, please note the issue with Jupyter web app.

In addition, here are the PRs that would be included in the new RC.

  • https://github.com/kubeflow/manifests/pull/2254
  • https://github.com/kubeflow/manifests/pull/2256
  • https://github.com/kubeflow/kubeflow/pull/6583

annajung avatar Jul 21 '22 17:07 annajung

Hi Distribution owner, providing you with another update on the RC.

As discussed in the release team meeting today (July 25th), we hope to have a new RC available for everyone early this week. We are waiting for this PR to merge as it aims to address the problem with building images using GitHub actions https://github.com/kubeflow/kubeflow/pull/6591, and once a new notebook release is available, then a PR needs to be created against the manifest repo.

The release team would like to stick with the current schedule and keep the distribution testing till August 10th as planned. However, with the delay in getting the new RC out, we also would like to gather your feedback on the current timeline and if you think it would be necessary to delay the release to increase the time for distribution testing. If you have any concerns with the current release timeline, please reach out soon to ensure your concerns are reviewed in advance before the end of distribution testing.

annajung avatar Jul 25 '22 17:07 annajung

Kubeflow v1.6.0-rc.1 is now available!

  • Announcement: https://groups.google.com/g/kubeflow-discuss/c/LJIHnACbHYY/m/Lcvah_xvAQAJ
  • RC 1 contains a known bug, creating a notebook through the UI. Notebook WG is working on a fix that may result in another RC in the future. For more details, you can follow the discussion in the following PRs: https://github.com/kubeflow/kubeflow/pull/6596 and https://github.com/kubeflow/kubeflow/pull/6599.

annajung avatar Jul 27 '22 16:07 annajung

Hi Distribution owners, friendly reminder to share any issues you ran into when testing and to update the kubeflow distribution docs

  • https://www.kubeflow.org/docs/started/installing-kubeflow/
  • https://www.kubeflow.org/docs/distributions/

Distribution testing and Doc updates are both scheduled to end on Wed Aug 10th 2022.

annajung avatar Aug 04 '22 13:08 annajung

@annajung In the last community meeting, there was a discussion to extend by one extra week

johnugeorge avatar Aug 04 '22 14:08 johnugeorge

Testing in progress from AWS side, no new issues so far. Will post an update by early next week. @annajung when do we expect the final RC to be out? Couldnt get clear idea from Community meeting notes

https://github.com/awslabs/kubeflow-manifests/issues/309

surajkota avatar Aug 04 '22 18:08 surajkota

Testing on GCP. We are observing profiles-deployment crashing. Could it be related to #2263? Has anyone else experienced it? Besides, we need the latest changes to contrib/metacontroller to be included in 1.6.0. They were not included in v1.6.0-rc.1. Thank you!

gkcalat avatar Aug 04 '22 18:08 gkcalat

Testing on OCI. Will report status early next week. Inspired by https://github.com/IBM/manifests/issues/47

julioo avatar Aug 05 '22 08:08 julioo

@annajung In the last community meeting, there was a discussion to extend by one extra week

Thanks @johnugeorge for raising this! I was not able to attend the last community meeting, but other release team members did inform me that distribution owners who were present in the meeting asked for an extension.

In addition to that, during the August 8th release team meeting, the release team discussed the following issues identified based on issues/comments mentioned in the distribution tracking and release tracking

  • Notebook: https://github.com/kubeflow/kubeflow/issues/6572 (an issue that wasn't completely resolved in RC1)
  • Notebook: https://github.com/kubeflow/manifests/pull/2263 (duplicate liveness probe)
  • Manifest: https://github.com/kubeflow/manifests/pull/2255 (was not included in the RC 1)

Based on the extension request and a need for a new RC, we are working on a new release timeline to provide to the community. We contacted the Notebook WG to determine if the issues identified are release blocking issues and if they will be providing another RC for the release.

Unless there are release blocking issues, we'll stick to the date that was agreed on during the community meeting last week which is August 17th for distribution testing to end.

I plan to send out an official announcement to kubeflow-discuss about the new timeline after catching up with the notebook WG or before the 10th whichever comes first.

annajung avatar Aug 09 '22 01:08 annajung

Hi everyone, I owe an update here - will be sending out a message on kubeflow-discuss today as well.

After catching up with notebook WG and investigating the three issues identified, here is where we are now.

  1. Missing Notebook image group
  • This has been identified as a release-blocking issue. There is a PR open that might fix the issue. However, even if we get this merged, the lead who can provide the release team with a new Notebook RC is not available until the week of Aug 22nd.
  1. Duplicate liveness probe in Notebook controller manager
  • After investigating this further, it looks like this is a non-issue for those using kustomize 3.x as stated in the kubeflow manifest installation README. For those using kustomize 4.x, there is a PR open to fix: https://github.com/kubeflow/kubeflow/pull/6604. Since Kubeflow Manifest does not support kustomize 4.x, this has been labeled as non-release-blocking issue.
  1. Metacontroller update not included in the RC 1
  • This PR adds the metacontroller into the /contrib directory which does not get used by default in the pipeline installation. By default, the metacontroller from /third-party is used and it already has the update that was made to the /contrib. This means that the current RC already contains this change, therefore, there are no changes to any functionality. I reached out to the pipeline team to get their feedback. Until we hear otherwise, the release team has labeled this as non-blocking-issue and has no plans to cut a new RC for this change.

Overall,

  • There is no plan for a new RC until the fix for the notebook issue (1) is available
  • We will be extending the release until the blocking issue is resolved, the new release date is TBD, will propose a date to the community for feedback
  • I do not believe cutting a new RC with a notebook issue fixed will have a huge impact on distributions. As of now, the plan is only to include the fix for the Notebook image group unless other release-blocking issues are identified
  • Please keep providing issues you have identified while testing
  • With the extra time, don't forget to update the kubeflow distribution docs as well

Thanks everyone!

cc @kubeflow/release-team

annajung avatar Aug 10 '22 18:08 annajung

The official announcement for the release delay has been sent to kubeflow-discuss mailing list. The proposed timeline PR is also available if distribution owners would like to provide your feedback.

  • Email announcement: https://groups.google.com/g/kubeflow-discuss/c/h0NX0XyUo74/m/6GTCsD7KAAAJ
  • New proposed release schedule: https://github.com/kubeflow/community/pull/569 (open for feedback until Aug 17th)

The proposed timeline moves the distribution testing end date to August 31st

annajung avatar Aug 10 '22 19:08 annajung

Hi folks, here's a list of issues we have run into (charmed kubeflow).

  • kubeflow/pipelines#7196, canonical/kfp-operators#54 - might get fixed by kubeflow/pipelines#7351
  • kubeflow/kubeflow#6056, canonical/bundle-kubeflow#460 - might get fixed by kubeflow/kubeflow#6415
  • kubeflow/manifests#2087

We also expect kubeflow/manifests#2150 to be merged soon, either for the next RC or patch release.

DnPlas avatar Aug 11 '22 00:08 DnPlas

Hello, here is an issue the AWS team has found on v1.6.0-rc.1. Currently we consider this a release blocker as this is feature-regression. We are currently looking into it, any help from the community would be welcome and appreciated.

  • https://github.com/kubeflow/kubeflow/issues/6618

ryansteakley avatar Aug 20 '22 01:08 ryansteakley

Hello, I successfully installed KF v1.6.0-rc.1 on Oracle Infrastructure (OKE 1.22.5, 1.23.4 and 1.24.1).

  • Exposed KF dashboard using LB and Istio gateway.
  • Run Demo pipeline like [Demo] XGBoost - Iterative model training and [Tutorial] Data passing in python components.
  • Created Notebook through web UI with success.
    • One problem with Mnist E2E Vanilla demo but related to ipykernel/iostream.py version will create an issue to document.

I am waiting to test the next RC and to share the Github page with OCI documentation.

julioo avatar Aug 23 '22 22:08 julioo

Created https://github.com/kubeflow/kubeflow/pull/6624 to address https://github.com/kubeflow/kubeflow/issues/6618

surajkota avatar Aug 24 '22 02:08 surajkota

  • One problem with Mnist E2E Vanilla demo but related to ipykernel/iostream.py version will create an issue to document. Created kubeflow/examples/issues/993 to document the issue

julioo avatar Aug 24 '22 21:08 julioo

@kimwnasptd @yuzisun It would be great to consider this PR https://github.com/kubeflow/kubeflow/pull/6627 for this release. Details in the issue

surajkota avatar Aug 25 '22 01:08 surajkota

Hi distribution owners, new notebook RC with the fix for the image group issue is planned to be cut by upcoming Tuesday.

The new RC might include more than the fix for the image group fix. It may include the fix for the profiler issue https://github.com/kubeflow/kubeflow/issues/6618 as well, hope notebook wg lead @kimwnasptd can share more.

As for other issues that were raised, none of them have been labeled as blocker issues from the WG leads so far. Therefore, not being tracked as release blocking issue for this release.

Please don't forget to keep your distribution docs updated by making updates to the following docs before the docs deadline EOD Aug 31st.

  • https://www.kubeflow.org/docs/started/installing-kubeflow/
  • https://www.kubeflow.org/docs/distributions/

annajung avatar Aug 26 '22 19:08 annajung

Hello, successfully installed KF v1.6.0-rc.1 on OCP 4.9. The ongoing testing issue can be tracked here - https://github.com/opendatahub-io/manifests/issues/99.

VaishnaviHire avatar Aug 29 '22 14:08 VaishnaviHire