gpt-neox
gpt-neox copied to clipboard
_forward_step_fn does not always return two values so eval.py breaks if is_pipe_parallel is false
This call to _forward_step_fn expects two values returned https://github.com/EleutherAI/gpt-neox/blob/59a5236ddaf721890e3d6ef98fb8ca66c2266ce0/eval_tasks/eval_adapter.py#L372
The forward_step can return three values https://github.com/EleutherAI/gpt-neox/blob/59a5236ddaf721890e3d6ef98fb8ca66c2266ce0/megatron/training.py#L847
I guess I am seeing this because I have is_pipe_parallel false and that is uncommon. Maybe there needs to be an option not to return metrics.
There are several "fixes" in https://github.com/markNZed/gpt-neox/tree/pipe_parallel_size_1 which might be related to this. I have not had the time to prepare PR but if someone who knows the code base just looks at the changes there I guess they will quickly see many easy to fix issues.
Can confirm I've run into this issue multiple times aswell, even with pipe parallel size >1.