PiPPy issues

[SPMD][Fusion] tracking - move global buffer to just before first fusion

Per Alisson - we can reduce memory overhead by having the global buffer first created at first use (i.e. just before first fusion) rather than the current instantiation at the...

lessw2020

[spmd] incorrect aten.expand call with nn.linear (expanded size must match existing size at dim 0)

1

This is to track/investigate the issue reported by Rich Zhu, where using permute to generate a transposed tensor for nn.linear, results in an incorrect aten.expand call. I've found two potential...

lessw2020

'CLIPVisionConfig' object has no attribute 'vocab_size'

CI failure caused by HF changes. ``` test/hf_test.py:637: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _...

kwen2501

[SPMD] Remove Gradient tensor clones added during DTensor comm collective insertion

After expansion of DTensor communication operations, fx is inserting a clone operation to clone the gradient tensor. This operation will slow down the perf and add memory, but is technically...

lessw2020

pytests_test_gpu(0) will fail if allocated a non-4 gpu server - add guard/skip?

In running the pytests for a recent PR, I was allocated a 3 gpu server rather than 4 gpu. (presumably a bad gpu on a 4 gpu server, but unclear...

lessw2020

Support Segformer models in HF tests

pbelevich

good first issue

huggingface

PiPPy

[spmd] torch.cat (aten.cat.default) not implemented for Distributed Tensor (tracking)

Implement a graph using torch.cat. convert it via SPMD Receive: raise NotImplementedError( NotImplementedError: Operator aten.cat.default does not have a DistributedTensor rule registered.) code location: File "/home/ubuntu/graph/spmd/api.py", line 110, in _get_dtensor_dispatch_graph...

lessw2020

[spmd] self-attention not converging

1

**What the problem is:** Both single-node and sharded `TensorParallelMultiheadAttention`(#477) modules diverge (the forward output becomes `-inf` after less than 10 iterations). Also they produce different forward output of which the...

XilunWu

[spmd] self-attention module's proj.bias isn't properly updated on all ranks but rank 0

1

**What the problem is:** - Sharded `TensorParallelMultiheadAttention`(#477) module fails to update `proj.bias` parameter though the back-propagated **gradient is correct**. - Also, this error doesn't occur on rank 0. **How to...

XilunWu

Hard to debug issue when passing a DTensor to spmd.distribute_tensor() (cuda/nccl only)

Passing a DTensor into spmd.distribute_tensor , or more specifically, into DeviceMesh, will cause issues - in device_mesh.broadcast, it will cause an assert to fail deep into torch code - in...

aazzolini

PiPPy
PiPPy copied to clipboard

Metadata

[SPMD][Fusion] tracking - move global buffer to just before first fusion

[spmd] incorrect aten.expand call with nn.linear (expanded size must match existing size at dim 0)

'CLIPVisionConfig' object has no attribute 'vocab_size'

[SPMD] Remove Gradient tensor clones added during DTensor comm collective insertion

pytests_test_gpu(0) will fail if allocated a non-4 gpu server - add guard/skip?

Support Segformer models in HF tests

[spmd] torch.cat (aten.cat.default) not implemented for Distributed Tensor (tracking)

[spmd] self-attention not converging

[spmd] self-attention module's proj.bias isn't properly updated on all ranks but rank 0

Hard to debug issue when passing a DTensor to spmd.distribute_tensor() (cuda/nccl only)

← Metadata

Owner

Metadata

PiPPy PiPPy copied to clipboard

Metadata

← Metadata

Owner

Metadata

PiPPy
PiPPy copied to clipboard