Tron1994

Results 4 comments of Tron1994

> I used the code in https://github.com/hpcaitech/ColossalAI-Examples/blob/main/language/gpt/train_gpt.py。 config.py : from colossalai.nn.optimizer import HybridAdam from colossalai.zero.shard_utils import TensorShardStrategy from titans.model.gpt import gpt2_small, gpt2_36B BATCH_SIZE = 2 NUM_EPOCHS = 60 SEQ_LEN =...

问题定位:Assertion ‘srcIndex < srcSelectDimSize‘ failed 原因:采用了yuan语料vocab.txt,和gpt默认vocab_size有出入,需要手动修改,可在配置gpt_zero3.py中修改: model = dict( type=gpt2_small, vocab_size=53227, checkpoint=True, ) 希望修复

发现teacher用chat模型比用base模型要好

有发现什么问题吗,我跑了很多实验,distil_loss也基本是持平的,非常微弱地下降 ![1731924431050](https://github.com/user-attachments/assets/9600c7ae-dc23-44c9-a43a-eec39d4111e5) ![image](https://github.com/user-attachments/assets/83d24a09-f1e4-49ff-b0cd-56cd7d033cf3)