gpt-neox icon indicating copy to clipboard operation
gpt-neox copied to clipboard

_forward_step_fn does not always return two values so eval.py breaks if is_pipe_parallel is false

Open markNZed opened this issue 1 year ago • 2 comments

This call to _forward_step_fn expects two values returned https://github.com/EleutherAI/gpt-neox/blob/59a5236ddaf721890e3d6ef98fb8ca66c2266ce0/eval_tasks/eval_adapter.py#L372

The forward_step can return three values https://github.com/EleutherAI/gpt-neox/blob/59a5236ddaf721890e3d6ef98fb8ca66c2266ce0/megatron/training.py#L847

I guess I am seeing this because I have is_pipe_parallel false and that is uncommon. Maybe there needs to be an option not to return metrics.

markNZed avatar Nov 12 '24 13:11 markNZed

There are several "fixes" in https://github.com/markNZed/gpt-neox/tree/pipe_parallel_size_1 which might be related to this. I have not had the time to prepare PR but if someone who knows the code base just looks at the changes there I guess they will quickly see many easy to fix issues.

markNZed avatar Nov 13 '24 16:11 markNZed

Can confirm I've run into this issue multiple times aswell, even with pipe parallel size >1.

iPRET avatar Nov 14 '24 13:11 iPRET