[Misc] ThroughputHook

Open flyinghu123 opened this issue 1 year ago • 1 comments

throughput_hook.py 中仅仅计算一个micro batch size的tgs，能否添加一个global batch size 的 tgs输出当accumulative_counts > 1时，在最后一个梯度累计iter，由于比其他iter多一个optim.step()操作，因此直接通过micro batch size输出的tgs求均值，会导致比实际tgs大，尤其在optim offload并且accumulative_counts较小时例如考虑单机单卡情况，accumulative_counts为2时，假设batch size为1，sequence_len为s，第一个iter tgs为 $\frac{s}{t_1}$，第二个iter tgs为 $\frac{s}{t_2}$ ，如果直接计算两个iter tgs均值，那么gbs tgs为 $\frac{(\frac{s}{t_1} + \frac{s}{t_2})}{2} = \frac{s (t_1+ t_2)}{2t_1t_2}$ 但是实际gbs tgs计算应为 $\frac{2s}{t_1 + t_2}$ 两者相除为 $\frac{(t_1+t_2)^2}{4t_1t_2} \geqslant 1$

当sequence_len固定时，通过在throughput_hook.py 中添加如下代码计算global batch size tgs

if (batch_idx+1) % runner.strategy.config['gradient_accumulation_steps'] == 0:
            message_hub.update_scalar('train/gbs_tokens_per_sec',
                                    batch_size * sequence_len / (
                                        message_hub.get_scalar('train/time').mean(runner.strategy.config['gradient_accumulation_steps']) + 1e-12))

Dec 19 '24 06:12 flyinghu123

realted issue: https://github.com/InternLM/xtuner/issues/967#issuecomment-2516042109

Dec 19 '24 07:12 CokeDong