Jiahao Li

Results 13 issues of Jiahao Li

Fixed two issues: * Padding should be ignored in training. Their labels should be set to `-100` for `CrossEntropyLoss` to ignore them. * Append correct `eos_token` to the response text....

Enable overlap of backward computation and gradient all-reduce. This produces 1.05x end-to-end speedup in SFT training with my settings. See also https://github.com/microsoft/DeepSpeed/pull/4887.

See https://interactivebrokers.github.io/tws-api/options.html for the new arg `manualOrderTime`