xtuner icon indicating copy to clipboard operation
xtuner copied to clipboard

llama3-8b扩充词表训练RuntimeError: CUDA error: device-side assert triggered

Open silvercherry opened this issue 1 year ago • 1 comments

我扩充了llama3的训练词表,并修改config中tokenizer和model中响应部分,但是报了RuntimeError: CUDA error: device-side assert triggered这样的错误。具体错误见图 tokenizer = dict( type=AutoTokenizer.from_pretrained, pretrained_model_name_or_path=new_tokens, trust_remote_code=True, padding_side='right')

model = dict( type=SupervisedFinetune, use_varlen_attn=use_varlen_attn, llm=dict( type=AutoModelForCausalLM.from_pretrained, pretrained_model_name_or_path=pretrained_model_name_or_path, trust_remote_code=True, torch_dtype=torch.float16), tokenizer=tokenizer) 并且在看log的时候,模型的embedding也做了resize(Resized token embeddings from 128256 to 129282.); 请问这个问题要怎么解决呢 Uploading 1.png…

silvercherry avatar Jul 30 '24 13:07 silvercherry

破案了,发现如果是zero3启动的话就会报错,zero2不会;这个大佬们可以修复下吗

silvercherry avatar Jul 31 '24 02:07 silvercherry