terapipe
terapipe copied to clipboard
thanks for sharing the code. i have a question about the attention module in "transformer_models.py" in your code , assume that there are 10 tokens in setence, i think the...
I notice that in this repository, it's still using the uniform way to split the input sequence. Could you please tell us when you will update the Dynamic Programming Version...
Makes the rank 0 GPU responsible for computing embedding and softmax. Both ends of the pipeline connect to the rank 0 GPU now. For GPT3-3hm, 8 slices, mixed precision, same...
Hi, Thanks for your great work, i would like to ask a question about DP. If some new coming request in another batch has different sequence length of the current...
@zhuohan123 Hi Zhuohan, Thanks for your cool work on pipeline parallelism. May I ask is Terapipe implemented on 1F1B schedule? Does it integrate on Megatron-LM framework? Thanks!