Ting Lu
Ting Lu
the mentioned distributed tests would fail if the number of GPUs available isn't sufficient. need to correct the default world size. cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar...
rebasing https://github.com/pytorch/pytorch/pull/124112. too many conflict files, so starting a new PR.
rebasing https://github.com/pytorch/pytorch/pull/124112. too many conflict files, so starting a new PR. Test https://github.com/pytorch/builder/pull/1775 (merged) for ARM wheel addition Test https://github.com/pytorch/builder/pull/1828 (merged) for setting MAX_JOBS cc @ptrblck @nWEIdia @atalman @malfet @Aidyn-A
To fix the below error, need to run `apt install libopenblas-dev` ``` File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 289, in from torch._C import * # noqa: F403 ImportError: libgfortran.so.5: cannot open shared object...
As part of process to add CUDA ARM nightly wheel, seeing OOM Error while building flash_attn in adding the https://github.com/pytorch/builder/pull/1775/files to nightly CI. ``` 2024-04-26T02:20:01.5211732Z [6579/6896] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim192_bf16_sm80.cu.o[K...
This PR adds magma build for CUDA 12.6 according to https://github.com/pytorch/builder/blob/main/CUDA_UPGRADE_GUIDE.MD Reference PR - https://github.com/pytorch/builder/pull/1722 Related to https://github.com/pytorch/pytorch/issues/138440 cc @atalman
### System Info root@eedf435b757f:/opt/pytorch/lightning-thunder# uname -a Linux eedf435b757f 5.15.0-101-generic #111-Ubuntu SMP Tue Mar 5 20:16:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux GPU - A100 bitsandbytes version 0.42.0 ### Reproduction Tried...
SBSA+CUDA is building beyond sm90 specified in https://github.com/pytorch/pytorch/blob/22dfb5b6cf9a9a92ff1f6c411a285dc1b2dd9f75/.ci/manywheel/build_cuda.sh#L64 ``` python -c "import torch; print(torch.cuda.get_arch_list())" ['sm_50', 'sm_80', 'sm_86', 'sm_89', 'sm_90', 'sm_90a'] ``` Root cause seems to be ARM build missing TORCH_CUDA_ARCH_LIST...