Do these Metrics really need to be scaled?
I noticed there is some scaling with loss_scale_factor in the dp_actor:
"actor/pg_loss": pg_loss.detach().item() * loss_scale_factor,
However I am wondering if these scalings are really needed, since the reduction over the metrics is by mean and not sum?
The loss is scaled so that they are parallelism ignorant. Otherwise, you will find that these losses magnitude is different if you use different sp_size, which should not be the case
The loss is scaled so that they are parallelism ignorant. Otherwise, you will find that these losses magnitude is different if you use different sp_size, which should not be the case
@vermouth1992 yes I agree the loss should be scaled for the backward, but this was not my point. My point is only that for reporting, we shouldnt need to scale it becasue there is already a reduction outside by mean.
The key point you need to scale it for the backward is because multiple forward-backward calls will reduce by sum, but that is not the case for reporting.
The loss is scaled so that they are parallelism ignorant. Otherwise, you will find that these losses magnitude is different if you use different sp_size, which should not be the case
@vermouth1992 yes I agree the loss should be scaled for the backward, but this was not my point. My point is only that for reporting, we shouldnt need to scale it becasue there is already a reduction outside by mean.
The key point you need to scale it for the backward is because multiple forward-backward calls will reduce by sum, but that is not the case for reporting.
I see what you mean. Maybe I was wrong. Could you confirm that those values will not change its magnitude when the ulysses sp size changes? Thanks!
@vermouth1992 its kind of roughly the same
> metrics['actor/pg_loss']
[-0.3054201006889343, 0.0, -0.2829209864139557, -0.16481846570968628, -0.15159031748771667]
> self.ulysses_sequence_parallel_size
1
> metrics['actor/pg_loss']
[1.2069861888885498, -0.24875977635383606, -0.2340698093175888, -0.1880173534154892, 0.0, -0.07240534573793411, 0.0, 0.0, -0.1661381721496582]
> self.ulysses_sequence_parallel_size
2
I think the current implementation has some bugs, we'll fix it soon
Agree. Current metrics looks confusing.
Agree