builder icon indicating copy to clipboard operation
builder copied to clipboard

Pytorch cuda in registry nightly images

Open bhack opened this issue 2 years ago • 6 comments

Are you testing if the nightly image is usable with cuda?

torch.cuda._is_compiled() is false inside the last nightly image

We should add docker images to validation framework.

Workflow: https://github.com/pytorch/pytorch/blob/main/.github/workflows/docker-release.yml

Docker containers are located here: https://github.com/orgs/pytorch/packages/container/package/pytorch

Simple install command: docker pull ghcr.io/pytorch/pytorch:2.2.1-cuda11.8-cudnn8-devel

Build workflow: https://github.com/pytorch/pytorch/actions/runs/8200189724/job/22426518545

bhack avatar Jun 16 '23 22:06 bhack

We should add automation around validation of docker images for both nightly and releases. Release workflow: https://github.com/pytorch/pytorch/actions/runs/8393526521/job/22988732918

Onboard to validation framework: https://github.com/pytorch/builder/actions/workflows/validate-binaries.yml

atalman avatar Apr 02 '24 15:04 atalman

cc @juliagmt-google

atalman avatar Apr 03 '24 15:04 atalman

Thanks for sharing the task and details. Here are my questions:

  1. torch.cuda._is_compiled() is false inside the last nightly image: where can I see the output?
  2. What exactly is the validation?
  3. I saw docker pull ghcr.io/pytorch/pytorch:2.2.2-cuda11.8-cudnn8-devel in https://github.com/orgs/pytorch/packages/container/package/pytorch where Docker containers are located, but the instruction says installing docker pull ghcr.io/pytorch/pytorch:2.2.1-cuda11.8-cudnn8-devel, which has a different PyTorch version. Which command should I use?
  4. Which files do we need to change to add automation and validation?

juliagmt-google avatar Apr 03 '24 17:04 juliagmt-google

Link to the images to validate: https://github.com/orgs/pytorch/packages/container/package/pytorch-nightly Nova workflows for reference: https://github.com/pytorch/test-infra/wiki/Using-Nova-Reusable-Build-Workflows

Try to call: https://github.com/pytorch/test-infra/blob/main/.github/workflows/linux_job.yml

atalman avatar Apr 04 '24 18:04 atalman

For gpu runners we need to use pytorch/test-infra/.github/workflows/linux_job.yml@main

juliagmt-google avatar Apr 09 '24 00:04 juliagmt-google

It would be nice to check in the CI job the layers invalidation as currently tracking nightly day by day is going to fill quite soon artifact registry space and local build cache. See https://github.com/pytorch/pytorch/issues/125862

bhack avatar Jun 03 '24 10:06 bhack