pytorch
pytorch copied to clipboard
Tensors and Dynamic neural networks in Python with strong GPU acceleration
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #129762...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #129802 * __->__ #129801 * #129800
### 🐛 Describe the bug Hello, I encountered some issues while using torch.distributed.pipelining. I tested PiPPy/examples/huggingface/pippy_gpt2.py with the default configuration. Because I'm working on full model testings, I added a...
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned audio hash.
## Issue description > RuntimeError: CUDA error: unspecified launch failure Error occurring on any training script. Occurrence is not deterministic. Can occur at anytime during the course of training. All...
### Approach: Using the current function declaration **Constraint:** Q_Heads % KV_Heads == 0 **Major change:** It adds a meaning to the last third dimension. **Pros:** This approach covers one major...
### 🐛 Describe the bug According to the documentation `torch.distributed.tensor.parallel.SequenceParallel` should shard on the sequence dimension i.e. `[B, T, C] -> [B, T//_world_size, C]` but it seems to be tiling...
FSDP2 eager pre-allocates the output buffer for AllGather and the AllGather just writes into that buffer. However, under compile, by default we use out-of-place AllGather, which means in Traceable FSDP2...
Fixes #95481 Test Plan: Unit tested checkpoint_wrapper.py by instantizing ActivationWrapper and got TypeError as expected. cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @fegin @XilunWu @wanchaol...