llm-foundry Tensor Parallel MLP with torch2.0

This PR adds torch 2.0 based tensor parallel support for the ffn block. It's ported over from https://github.com/mosaicml/examples/pull/255

Currently the trained weights don't match between parallel/no-parallel versions even in a simple example. Created a PyTorch issue: https://github.com/pytorch/pytorch/issues/102280

Test: I'll add a test (the one submitted in PyTorch issue) once it works fine.

May 22 '23 23:05 dskhudia

@vchiley : Are these test failures due to a recent change? These don't look related to this PR.

May 25 '23 18:05 dskhudia

Screenshot 2023-05-25 at 11 43 04 AM

All tests pass from last PR merge

May 25 '23 18:05 vchiley

Is self-attention parallelizable with some code modification?

Sep 06 '23 06:09 cliangyu

Is self-attention parallelizable with some code modification?

It is but with code modifications. See https://pytorch.org/docs/stable/_modules/torch/distributed/tensor/parallel/multihead_attention_tp.html#TensorParallelMultiheadAttention

Sep 06 '23 17:09 dskhudia

Thanks! May I know if you plan to add it to this PR?

From: Daya Khudia @.> Sent: Thursday, September 7, 2023 1:21:11 AM To: mosaicml/llm-foundry @.> Cc: Liangyu Chen @.>; Comment @.> Subject: Re: [mosaicml/llm-foundry] Tensor Parallel MLP with torch2.0 (PR #192)

Is self-attention parallelizable with some code modification?

It is but with code modifications. See https://pytorch.org/docs/stable/_modules/torch/distributed/tensor/parallel/multihead_attention_tp.html#TensorParallelMultiheadAttention

— Reply to this email directly, view it on GitHubhttps://github.com/mosaicml/llm-foundry/pull/192#issuecomment-1708798880, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKYMSEWHLTLTIKZGGIYNHMTXZCWIPANCNFSM6AAAAAAYLC5HJM. You are receiving this because you commented.Message ID: @.***>

Sep 06 '23 19:09 cliangyu

@dskhudia should we close this?

Feb 02 '24 00:02 dakinggg

@dakinggg yes.

Feb 02 '24 00:02 dskhudia

llm-foundry llm-foundry copied to clipboard

Tensor Parallel MLP with torch2.0

llm-foundry
llm-foundry copied to clipboard