Xilun Wu issues

Results 11 issues of


                                            Xilun Wu

[DTensor] check DeviceMesh ranks contiguity

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #92069 * __->__ #91802 * #91801 * #91756

topic: not user facing

[spmd] self-attention not converging

**What the problem is:** Both single-node and sharded `TensorParallelMultiheadAttention`(#477) modules diverge (the forward output becomes `-inf` after less than 10 iterations). Also they produce different forward output of which the...

[spmd] self-attention module's proj.bias isn't properly updated on all ranks but rank 0

**What the problem is:** - Sharded `TensorParallelMultiheadAttention`(#477) module fails to update `proj.bias` parameter though the back-propagated **gradient is correct**. - Also, this error doesn't occur on rank 0. **How to...

enable TritonFusedRMSNorm with local_map annotation

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #364

CLA Signed

enable Context Parallel

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #592

CLA Signed

[cp][flex_attention] integration test trial

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #1160

CLA Signed

module: context parallel

Xilun Wu

[DTensor] check DeviceMesh ranks contiguity

[spmd] self-attention not converging

[spmd] self-attention module's proj.bias isn't properly updated on all ranks but rank 0

enable TritonFusedRMSNorm with local_map annotation

enable Context Parallel

[cp][flex_attention] integration test trial

[draft] print blockmask sprsity

[CP] test load-balance on llama3-8B

[RFC][WIP][CP] Enable FlexAttention CP for llama3

[RFC] Lift freqs_cis as an input of models