builder icon indicating copy to clipboard operation
builder copied to clipboard

Pytorch CUDA Upgrade to 11.7 and Decommsion 11.3 and 10.2

Open atalman opened this issue 2 years ago • 2 comments

This issue will track the current progress on upgrading CUDA 11.7 support, and decommission legacy CUDA version

Cuda Support Matrix as of Pytorch 1.12

CUDA CUDNN additional details
10.2 7.6.5.32 Legacy CUDA Release, to be decommissioned issue
11.3 8.3.2.44 Stable CUDA Release
11.6 8.3.2.44 Latest CUDA Release

Pre CUDA 11.7 Upgrade

This issue is required to move CUDA 11.6 to Stable version. And we want to address it before CUDA 11.7.

  • [x] https://github.com/pytorch/pytorch/issues/69691 Conda-forge dependency for 11.6 for cudatoolkit. In short Since CUDA 11.5, cudatoolkit is only available on conda-forge channel. We should migrate from cudatoolkit to cuda and abandon usage of conda-forge from pytorch, torchvision and torchaudio. This work should be scheduled and addressed as soon as we cut release 1.12 for pytorch and all domain libraries.

Decommission CUDA 10.2

This can be done in parallel to CUDA 11.7 upgrade. We want to ultimately address it before 11.7, but can also be done in parallel.

  • [x] https://github.com/pytorch/builder/issues/1026 Decommission CUDA 10.2 Support. We have an open issue to track this: issue and related discussion . With CUDA 11+ users can not download it from pip. And pip is a very popular package manager.

Upgrade CUDA 11.7

As per https://github.com/pytorch/builder/blob/main/CUDA_UPGRADE_GUIDE.MD

  • [x] Installing to conda-builder and libtorch containers
    • [x] Push pytorch/conda-builder
    • [x] Push the libtorch image
  • [x] Add setup to manywheels
    • [x] Push pytorch/manylinux-builder
  • [x] Update MAGMA
    • [x] Push magma-cuda117 to conda
    • [x] Add magma for windows into our S3
  • [x] Add Windows builder for 11.7
    • [x] Check if driver needs to be updated
    • [x] Add fixes that had to come up
  • [x] Include CUDA 11.7 into our nightly matrix
    • [x] Update conda build_pytorch.sh script and add conda binaries
    • [x] Windows
    • [x] Linux
    • [x] MacOS
    • [x] Add fixes that had to come up
  • [x] Create 11.7 CI
    • [x] Windows
    • [x] Linux + add MAGMA to CI conda
  • [x] Add 11.7 to torchvision CI
  • [x] Add 11.7 to torchaudio CI

Past Issues to be Resolved by upgrade (needs to be retested)

  • [ ] https://github.com/pytorch/pytorch/issues/75391
  • [ ] https://github.com/pytorch/pytorch/issues/75375
  • [x] https://github.com/pytorch/pytorch/issues/70111
  • [x] https://github.com/pytorch/pytorch/issues/69460
  • [x] https://github.com/pytorch/pytorch/issues/69023
  • [x] https://github.com/pytorch/pytorch/issues/57482

Post CUDA 11.7 Upgrade

  • [x] #1106
  • [x] #1123
  • [x] Move CUDA 11.6 as Stable CUDA

Target End State

CUDA 11.6 - Stable, CUDA 11.7 - Latest Experimental CUDA 10.2 and CUDA 11.3 Decommissioned

BE tasks for Meta Team

  • [ ] Eliminate runbook manual step 6 by fixing this issue https://github.com/pytorch/test-infra/issues/92

cc @ptrblck @malfet @seemethere @ezyang @pytorch/pytorch-dev-infra @ngimel

atalman avatar Jun 01 '22 15:06 atalman

For 11.7: Created two PRs to add the docker and magma builds.

CC @crcrpar Could you check the launch bounds for torch.mode in 11.7, please?

CC @IvanYashchuk Adding you for potentially needed MAGMA fixes

ptrblck avatar Jun 03 '22 06:06 ptrblck

I believe this is now complete, correct @atalman ?

bryantbiggs avatar Mar 28 '24 12:03 bryantbiggs