terapipe issues

some question about attention module

1

thanks for sharing the code. i have a question about the attention module in "transformer_models.py" in your code , assume that there are 10 tokens in setence, i think the...

oujieww

Dynamic Programming Update Request

I notice that in this repository, it's still using the uniform way to split the input sequence. Could you please tell us when you will update the Dynamic Programming Version...

ConnollyLeon

[ARCHIEVED] generate timeline

suquark

fix backward timing

zhuohan123

Implement embedding+softmax on a separate GPU

1

Makes the rank 0 GPU responsible for computing embedding and softmax. Both ends of the pipeline connect to the rank 0 GPU now. For GPT3-3hm, 8 slices, mixed precision, same...

sguo35

Question about DP with new request

Hi, Thanks for your great work, i would like to ask a question about DP. If some new coming request in another batch has different sequence length of the current...

lhcezx

Is Terapipe implemented on 1F1B schedule?

1

@zhuohan123 Hi Zhuohan, Thanks for your cool work on pipeline parallelism. May I ask is Terapipe implemented on 1F1B schedule? Does it integrate on Megatron-LM framework? Thanks!

robotsp

terapipe
terapipe copied to clipboard

Metadata

some question about attention module

Dynamic Programming Update Request

[ARCHIEVED] generate timeline

fix backward timing

Implement embedding+softmax on a separate GPU

Question about DP with new request

Is Terapipe implemented on 1F1B schedule?

← Metadata

Owner

Metadata

terapipe terapipe copied to clipboard

Metadata

some question about attention module

Dynamic Programming Update Request

[ARCHIEVED] generate timeline

fix backward timing

Implement embedding+softmax on a separate GPU

Question about DP with new request

Is Terapipe implemented on 1F1B schedule?

← Metadata

Owner

Metadata

terapipe
terapipe copied to clipboard