xtuner icon indicating copy to clipboard operation
xtuner copied to clipboard

[Misc] ThroughputHook

Open flyinghu123 opened this issue 1 year ago • 1 comments

throughput_hook.py 中仅仅计算一个micro batch sizetgs,能否添加一个global batch sizetgs输出 当accumulative_counts > 1时,在最后一个梯度累计iter,由于比其他iter多一个optim.step()操作,因此直接通过micro batch size输出的tgs求均值,会导致比实际tgs大,尤其在optim offload并且accumulative_counts较小时 例如考虑单机单卡情况,accumulative_counts2时,假设batch size1sequence_lens,第一个iter tgs为 $\frac{s}{t_1}$,第二个iter tgs为 $\frac{s}{t_2}$ ,如果直接计算两个iter tgs均值,那么gbs tgs为 $\frac{(\frac{s}{t_1} + \frac{s}{t_2})}{2} = \frac{s (t_1+ t_2)}{2t_1t_2}$ 但是实际gbs tgs计算应为 $\frac{2s}{t_1 + t_2}$ 两者相除为 $\frac{(t_1+t_2)^2}{4t_1t_2} \geqslant 1$

sequence_len固定时,通过在throughput_hook.py 中添加如下代码计算global batch size tgs

if (batch_idx+1) % runner.strategy.config['gradient_accumulation_steps'] == 0:
            message_hub.update_scalar('train/gbs_tokens_per_sec',
                                    batch_size * sequence_len / (
                                        message_hub.get_scalar('train/time').mean(runner.strategy.config['gradient_accumulation_steps']) + 1e-12))

flyinghu123 avatar Dec 19 '24 06:12 flyinghu123

realted issue: https://github.com/InternLM/xtuner/issues/967#issuecomment-2516042109

CokeDong avatar Dec 19 '24 07:12 CokeDong