yiy
yiy
https://github.com/hpcaitech/ColossalAI/blob/29386a54e66d7e5ca40cabf1686839fba9aac71d/applications/ChatGPT/chatgpt/models/base/critic.py#L46
For small models, actor, critic, init-actor, and reward can be loaded on a single machine. However, how to build the PPO process for LLM?
we want to continue fine tuning a bloomz-7b1 model, where can we get the model checkpoints like 176B :
你好,请教个问题。 在pretrain阶段,处理wudao语料后,用2000切割,后续又切割为500。 在get_input_data.py中构建pretrain的source_tokens、target_tokens。这两个会跨越多个 document 吗