PiPPy icon indicating copy to clipboard operation
PiPPy copied to clipboard

Pipeline Parallelism for PyTorch

Results 123 PiPPy issues
Sort by recently updated
recently updated
newest added

Per Alisson - we can reduce memory overhead by having the global buffer first created at first use (i.e. just before first fusion) rather than the current instantiation at the...

This is to track/investigate the issue reported by Rich Zhu, where using permute to generate a transposed tensor for nn.linear, results in an incorrect aten.expand call. I've found two potential...

CI failure caused by HF changes. ``` test/hf_test.py:637: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _...

After expansion of DTensor communication operations, fx is inserting a clone operation to clone the gradient tensor. This operation will slow down the perf and add memory, but is technically...

In running the pytests for a recent PR, I was allocated a 3 gpu server rather than 4 gpu. (presumably a bad gpu on a 4 gpu server, but unclear...

good first issue
huggingface
PiPPy

Implement a graph using torch.cat. convert it via SPMD Receive: raise NotImplementedError( NotImplementedError: Operator aten.cat.default does not have a DistributedTensor rule registered.) code location: File "/home/ubuntu/graph/spmd/api.py", line 110, in _get_dtensor_dispatch_graph...

**What the problem is:** Both single-node and sharded `TensorParallelMultiheadAttention`(#477) modules diverge (the forward output becomes `-inf` after less than 10 iterations). Also they produce different forward output of which the...

**What the problem is:** - Sharded `TensorParallelMultiheadAttention`(#477) module fails to update `proj.bias` parameter though the back-propagated **gradient is correct**. - Also, this error doesn't occur on rank 0. **How to...

Passing a DTensor into spmd.distribute_tensor , or more specifically, into DeviceMesh, will cause issues - in device_mesh.broadcast, it will cause an assert to fail deep into torch code - in...