builder
builder copied to clipboard
Pytorch CUDA Upgrade to 11.7 and Decommsion 11.3 and 10.2
This issue will track the current progress on upgrading CUDA 11.7 support, and decommission legacy CUDA version
Cuda Support Matrix as of Pytorch 1.12
CUDA | CUDNN | additional details |
---|---|---|
10.2 | 7.6.5.32 | Legacy CUDA Release, to be decommissioned issue |
11.3 | 8.3.2.44 | Stable CUDA Release |
11.6 | 8.3.2.44 | Latest CUDA Release |
Pre CUDA 11.7 Upgrade
This issue is required to move CUDA 11.6 to Stable version. And we want to address it before CUDA 11.7.
- [x] https://github.com/pytorch/pytorch/issues/69691 Conda-forge dependency for 11.6 for cudatoolkit. In short Since CUDA 11.5, cudatoolkit is only available on conda-forge channel. We should migrate from cudatoolkit to cuda and abandon usage of conda-forge from pytorch, torchvision and torchaudio. This work should be scheduled and addressed as soon as we cut release 1.12 for pytorch and all domain libraries.
Decommission CUDA 10.2
This can be done in parallel to CUDA 11.7 upgrade. We want to ultimately address it before 11.7, but can also be done in parallel.
- [x] https://github.com/pytorch/builder/issues/1026 Decommission CUDA 10.2 Support. We have an open issue to track this: issue and related discussion . With CUDA 11+ users can not download it from pip. And pip is a very popular package manager.
Upgrade CUDA 11.7
As per https://github.com/pytorch/builder/blob/main/CUDA_UPGRADE_GUIDE.MD
- [x] Installing to conda-builder and libtorch containers
- [x] Push pytorch/conda-builder
- [x] Push the libtorch image
- [x] Add setup to manywheels
- [x] Push pytorch/manylinux-builder
- [x] Update MAGMA
- [x] Push magma-cuda117 to conda
- [x] Add magma for windows into our S3
- [x] Add Windows builder for 11.7
- [x] Check if driver needs to be updated
- [x] Add fixes that had to come up
- [x] Include CUDA 11.7 into our nightly matrix
- [x] Update conda
build_pytorch.sh
script and add conda binaries - [x] Windows
- [x] Linux
- [x] MacOS
- [x] Add fixes that had to come up
- [x] Update conda
- [x] Create 11.7 CI
- [x] Windows
- [x] Linux + add MAGMA to CI conda
- [x] Add 11.7 to torchvision CI
- [x] Add 11.7 to torchaudio CI
Past Issues to be Resolved by upgrade (needs to be retested)
- [ ] https://github.com/pytorch/pytorch/issues/75391
- [ ] https://github.com/pytorch/pytorch/issues/75375
- [x] https://github.com/pytorch/pytorch/issues/70111
- [x] https://github.com/pytorch/pytorch/issues/69460
- [x] https://github.com/pytorch/pytorch/issues/69023
- [x] https://github.com/pytorch/pytorch/issues/57482
Post CUDA 11.7 Upgrade
- [x] #1106
- [x] #1123
- [x] Move CUDA 11.6 as Stable CUDA
Target End State
CUDA 11.6 - Stable, CUDA 11.7 - Latest Experimental CUDA 10.2 and CUDA 11.3 Decommissioned
BE tasks for Meta Team
- [ ] Eliminate runbook manual step 6 by fixing this issue https://github.com/pytorch/test-infra/issues/92
cc @ptrblck @malfet @seemethere @ezyang @pytorch/pytorch-dev-infra @ngimel
For 11.7: Created two PRs to add the docker and magma builds.
CC @crcrpar
Could you check the launch bounds for torch.mode
in 11.7, please?
CC @IvanYashchuk Adding you for potentially needed MAGMA fixes
I believe this is now complete, correct @atalman ?