前向传播返回值缺少bal_loss

Open tisgotos opened this issue 5 months ago • 2 comments

在应用完补丁执行pretrain_gpt.py遇到的问题 Traceback (most recent call last): File "pretrain_gpt.py", line 126, in pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/workspace/Megatron-LM/megatron/training.py", line 157, in pretrain iteration = train(forward_step_func, File "/workspace/Megatron-LM/megatron/training.py", line 630, in train train_step(forward_step_func, File "/workspace/Megatron-LM/megatron/training.py", line 377, in train_step losses_reduced = forward_backward_func( File "/workspace/Megatron-LM/megatron/schedules.py", line 132, in forward_backward_no_pipelining output_tensor, bal_loss = forward_step(forward_step_func, data_iterator, model, File "/workspace/Megatron-LM/megatron/schedules.py", line 61, in forward_step output_tensor, loss_func, bal_loss = forward_step_func(data_iterator, model) ValueError: not enough values to unpack (expected 3, got 2)

pretrain_gpt源码：

def forward_step(data_iterator, model): """Forward step.""" args = get_args() timers = get_timers()

# Get the batch.
timers('batch-generator').start()
tokens, labels, loss_mask, attention_mask, position_ids = get_batch(
    data_iterator)
timers('batch-generator').stop()

output_tensor = model(tokens, position_ids, attention_mask,
                      labels=labels)

return output_tensor, partial(loss_func, loss_mask)

Sep 07 '24 02:09 tisgotos

fastmoe fastmoe copied to clipboard

前向传播返回值缺少bal_loss

pretrain_gpt源码：

fastmoe
fastmoe copied to clipboard