llm-foundry
llm-foundry copied to clipboard
Tensor Parallel MLP with torch2.0
This PR adds torch 2.0 based tensor parallel support for the ffn block. It's ported over from https://github.com/mosaicml/examples/pull/255
Currently the trained weights don't match between parallel/no-parallel versions even in a simple example. Created a PyTorch issue: https://github.com/pytorch/pytorch/issues/102280
Test: I'll add a test (the one submitted in PyTorch issue) once it works fine.
@vchiley : Are these test failures due to a recent change? These don't look related to this PR.
All tests pass from last PR merge
Is self-attention parallelizable with some code modification?
Is self-attention parallelizable with some code modification?
It is but with code modifications. See https://pytorch.org/docs/stable/_modules/torch/distributed/tensor/parallel/multihead_attention_tp.html#TensorParallelMultiheadAttention
Thanks! May I know if you plan to add it to this PR?
From: Daya Khudia @.> Sent: Thursday, September 7, 2023 1:21:11 AM To: mosaicml/llm-foundry @.> Cc: Liangyu Chen @.>; Comment @.> Subject: Re: [mosaicml/llm-foundry] Tensor Parallel MLP with torch2.0 (PR #192)
Is self-attention parallelizable with some code modification?
It is but with code modifications. See https://pytorch.org/docs/stable/_modules/torch/distributed/tensor/parallel/multihead_attention_tp.html#TensorParallelMultiheadAttention
— Reply to this email directly, view it on GitHubhttps://github.com/mosaicml/llm-foundry/pull/192#issuecomment-1708798880, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKYMSEWHLTLTIKZGGIYNHMTXZCWIPANCNFSM6AAAAAAYLC5HJM. You are receiving this because you commented.Message ID: @.***>
@dskhudia should we close this?
@dakinggg yes.