[QUESTION]UnboundLocalError:local variable ‘output tensor’ referenced before assignmnet
I pretrain Llama3-8B,but met the problem: (1)the configuration : vp2, pp 8, 8gpus, meeting the erros: deallocate_output_tensor(output_tensor, config.deallocate_pipeline_outputs), UnboundLocalError:local variable ‘output tensor’ referenced before assignmnet
(2)but when I change the pp from 8 to 4, it can work well.
why? have someone met the same problem?
I meet same problem, but it's not been solved....
Also encountering the same problem with BERT (using 32 layers, with 32 GPUs, 16 PP stages, 2 num layers per virtual pipeline stage)
Marking as stale. No activity in 60 days.
I bumped into a similar issue when I mistakenly specified the --num-layers-per-virtual-pipeline-stage larger than intended.
For example,
--num-layers=16
--pipeline-model-parallel-size=4
--num-layers-per-virtual-pipeline-stage=4
lead to virtual_pipeline_model_parallel_size=1, which doesn't seem to be anticipated.
Fixing the value of --num-layers-per-virtual-pipeline-stage to a reasonable value (like 2, in the above case) resolved the issue in my case.
Marking as stale. No activity in 60 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.