Hongbin Liu
Hongbin Liu
### Description I am learning the chronicles_prequel, and I find the last table in the chapter indicates the higher TFLOPS is achieved with Zero_Stage = 1. [Trying with ZeRO_STAGE=0/1](https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/chronicles-prequel.md#48-node-contenders) Zero_stage=1...
直播源稳定更新吗
首先多谢大神, 有几个问题请教一下: 1. 直播源是否会稳定更新? 2. Ipv6直播源会比v4直播源更稳定吗? 3. 如何自己去查找直播源(v4&v6)?
### 确认 - [X] 我的版本是最新版本,我的版本号与 [version](https://github.com/jxxghp/MoviePilot/releases/latest) 相同。 - [X] 我已经 [issue](https://github.com/jxxghp/MoviePilot/issues) 中搜索过,确认我的问题没有被提出过。 - [X] 我已经 [Telegram频道](https://t.me/moviepilot_channel) 中搜索过,确认我的问题没有被提出过。 - [X] 我已经修改标题,将标题中的 描述 替换为我遇到的问题。 ### 当前程序版本 1.9.15 ### 运行环境 Docker ### 问题类型...
# Description Add a flag split_bw to control if we should separate wgrad from backward() and schedule it in another function to better hide the a2a communication when training moe...
# Description Please include a brief summary of the changes, relevant motivation and context. Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the...
# What does this PR do ? [PR for main](https://github.com/NVIDIA/Megatron-LM/pull/2236) - replace `enable_deepep` with `use_flex_dispatcher` so that deepep and hybridep will be treated in the same way in 1f1b a2a...
# What does this PR do ? - replace `enable_deepep` with `use_flex_dispatcher` so that deepep and hybridep will be treated in the same way in 1f1b a2a overlap; - add...