cluster-api-provider-gcp icon indicating copy to clipboard operation
cluster-api-provider-gcp copied to clipboard

[WIP] GPU support

Open whtssub opened this issue 3 years ago • 25 comments
trafficstars

What type of PR is this? /kind api-change //kind feature

What this PR does / why we need it: In support to accommodate the API adjustments needed to manage GPU acceleration of the GCP instances in CAPG.

Special notes for your reviewer: This is a WIP PR, additional controller changes to be added successively.

TODOs:

  • [ ] squashed commits
  • [ ] includes documentation
  • [ ] adds unit tests

Release note:

NONE

cc @richardcase @dims @cpanato

whtssub avatar Jul 14 '22 06:07 whtssub

Hi @SubhasmitaSw. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jul 14 '22 06:07 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SubhasmitaSw Once this PR has been reviewed and has the lgtm label, please assign dims for approval by writing /assign @dims in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jul 22 '22 17:07 k8s-ci-robot

Added the getters for the new GPU API fields so we can get the items and use them in reconciliation. @richardcase do we need setters to set those values in case not provided?

aniruddha2000 avatar Jul 23 '22 15:07 aniruddha2000

@SubhasmitaSw @aniruddha2000 any status on this?

cpanato avatar Aug 22 '22 15:08 cpanato

@cpanato Just a little bit in the documentation is remaining. I and @SubhasmitaSw are currently facing some difficulty understanding the e2e behavior.

aniruddha2000 avatar Aug 22 '22 16:08 aniruddha2000

@cpanato Just a little bit in the documentation is remaining. I and @SubhasmitaSw are currently facing some difficulty understanding the e2e behavior.

@SubhasmitaSw @aniruddha2000 - did you want to chat about the e2e?

richardcase avatar Aug 23 '22 09:08 richardcase

@SubhasmitaSw: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Sep 14 '22 00:09 k8s-ci-robot

@SubhasmitaSw: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-gcp-build d33fd97e47c96b3c6f7d93f93433f37d6a42dd77 link true /test pull-cluster-api-provider-gcp-build
pull-cluster-api-provider-gcp-apidiff d33fd97e47c96b3c6f7d93f93433f37d6a42dd77 link false /test pull-cluster-api-provider-gcp-apidiff
pull-cluster-api-provider-gcp-verify d33fd97e47c96b3c6f7d93f93433f37d6a42dd77 link true /test pull-cluster-api-provider-gcp-verify
pull-cluster-api-provider-gcp-test d33fd97e47c96b3c6f7d93f93433f37d6a42dd77 link true /test pull-cluster-api-provider-gcp-test
pull-cluster-api-provider-gcp-make d33fd97e47c96b3c6f7d93f93433f37d6a42dd77 link true /test pull-cluster-api-provider-gcp-make
pull-cluster-api-provider-gcp-e2e-test d33fd97e47c96b3c6f7d93f93433f37d6a42dd77 link true /test pull-cluster-api-provider-gcp-e2e-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Sep 23 '22 04:09 k8s-ci-robot

Any update?

FischerLGLN avatar Oct 12 '22 13:10 FischerLGLN

We need to get the images published first. I will check on that first

richardcase avatar Oct 12 '22 14:10 richardcase

We need to get the images published first. I will check on that first

let me check that today

cpanato avatar Oct 12 '22 14:10 cpanato

We need to get the images published first. I will check on that first

let me check that today

If the gpu param exist, I can test that with Nvidia GPU Operator, if it works flawlessly.

FischerLGLN avatar Oct 12 '22 15:10 FischerLGLN

Hmm, merge conflict in go.sum and go.mod

FischerLGLN avatar Oct 13 '22 11:10 FischerLGLN

I'll tidy this up!

whtssub avatar Oct 20 '22 05:10 whtssub

@SubhasmitaSw Any update?

eranco74 avatar Dec 14 '22 09:12 eranco74

/retitle [WIP] GPU support

richardcase avatar Dec 14 '22 18:12 richardcase

@SubhasmitaSw Any update?

This is currently blocked waiting for the image-builder changes to merge. I will look at unblocking this asap.

Until image builder changes are merged and images are available, adding an explicit:

/hold

richardcase avatar Dec 14 '22 18:12 richardcase

thanks for the update @richardcase , i'm adding myself to the subscribe list here as i am taking a look at compatibility with openshift for the changes.

elmiko avatar Dec 22 '22 20:12 elmiko

I am picking this up again after the holiday break.

richardcase avatar Jan 03 '23 10:01 richardcase

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 26 '23 11:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar May 26 '23 11:05 k8s-triage-robot

/remove-lifecycle rotten

richardcase avatar Jun 01 '23 09:06 richardcase

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SubhasmitaSw Once this PR has been reviewed and has the lgtm label, please ask for approval from richardcase. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Oct 12 '23 04:10 k8s-ci-robot

@SubhasmitaSw: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-gcp-apidiff 244060dbd0cabd48576ae28977f64ae656a7382a link false /test pull-cluster-api-provider-gcp-apidiff
pull-cluster-api-provider-gcp-test 244060dbd0cabd48576ae28977f64ae656a7382a link true /test pull-cluster-api-provider-gcp-test
pull-cluster-api-provider-gcp-build 244060dbd0cabd48576ae28977f64ae656a7382a link true /test pull-cluster-api-provider-gcp-build
pull-cluster-api-provider-gcp-verify 244060dbd0cabd48576ae28977f64ae656a7382a link true /test pull-cluster-api-provider-gcp-verify
pull-cluster-api-provider-gcp-e2e-test 244060dbd0cabd48576ae28977f64ae656a7382a link true /test pull-cluster-api-provider-gcp-e2e-test
pull-cluster-api-provider-gcp-make 244060dbd0cabd48576ae28977f64ae656a7382a link true /test pull-cluster-api-provider-gcp-make

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Oct 12 '23 04:10 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 22 '24 07:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 21 '24 07:02 k8s-triage-robot

/remove-lifecycle rotten

I will pick up the image building side so that we can get this merged.

richardcase avatar Feb 21 '24 10:02 richardcase

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Mar 22 '24 11:03 k8s-triage-robot

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 22 '24 11:03 k8s-ci-robot

/reopen

richardcase avatar Mar 22 '24 14:03 richardcase

@richardcase: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 22 '24 14:03 k8s-ci-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Apr 21 '24 14:04 k8s-triage-robot