Megatron-LM [QUESTION]UnboundLocalError：local variable ‘output tensor’ referenced before assignmnet

I pretrain Llama3-8B，but met the problem: (1)the configuration : vp2, pp 8, 8gpus, meeting the erros: deallocate_output_tensor(output_tensor, config.deallocate_pipeline_outputs)， UnboundLocalError：local variable ‘output tensor’ referenced before assignmnet

(2)but when I change the pp from 8 to 4, it can work well.

why? have someone met the same problem?

Dec 05 '24 08:12 zmtttt

I meet same problem, but it's not been solved....

Jan 10 '25 02:01 LitPrice

Also encountering the same problem with BERT (using 32 layers, with 32 GPUs, 16 PP stages, 2 num layers per virtual pipeline stage)

Jan 27 '25 23:01 bmehta001

Marking as stale. No activity in 60 days.

Mar 29 '25 18:03 github-actions[bot]

I bumped into a similar issue when I mistakenly specified the --num-layers-per-virtual-pipeline-stage larger than intended.

For example,

--num-layers=16
--pipeline-model-parallel-size=4
--num-layers-per-virtual-pipeline-stage=4

lead to virtual_pipeline_model_parallel_size=1, which doesn't seem to be anticipated.

Fixing the value of --num-layers-per-virtual-pipeline-stage to a reasonable value (like 2, in the above case) resolved the issue in my case.

May 21 '25 06:05 Ktakuya332C

Marking as stale. No activity in 60 days.

Jul 20 '25 18:07 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Jul 30 '25 02:07 github-actions[bot]