MagicSource

Results 1257 comments of MagicSource
trafficstars

I wanna using official llava base. I just need to add a vipProceessor to process image right?

我应该已经调整了conv template为 mpt 格式。 其次,我用的是transformers最新版,会有 WARNING: tokenization mismatch: 42 vs. 43. (ignored) WARNING: tokenization mismatch: 44 vs. 45. (ignored) WARNING: tokenization mismatch: 51 vs. 52. (ignored) WARNING: tokenization mismatch: 45...

我发现你是改了这个地方: ``` if has_image: round_len = len(tokenizer_image_token(rou, tokenizer)) + 1 # for eos_token instruction_len = len(tokenizer_image_token(parts[0], tokenizer)) - 1 # instruction_len is before the answer else: round_len = len(tokenizer(rou).input_ids) instruction_len...

我尝试修改了preprocess, 沿用了chatml template, loss还是 0. 理论上来说,和你的template也是多了一个 eos,应该不至于loss 泵掉 此外,修改之后,依旧存在 WARNING: tokenization mismatch: 49 vs. 50. (ignored) WARNING: tokenization mismatch: 47 vs. 48. (ignored) WARNING: tokenization mismatch: 46 vs. 47. (ignored)...

@LinB203 Yes: ``` MODEL_VERSION=qwen-1.8b ########### DO NOT CHANGE ########### ########### USE THIS FOR BOTH ########### PROMPT_VERSION=qwen deepspeed train_xformers.py \ --deepspeed ./scripts/zero2.json \ --model_name_or_path ./checkpoints/$MODEL_VERSION \ --version $PROMPT_VERSION \ --data_path ./data/llava_0.1/pretrain_data.json...

I changed to plain, still got loss 0 ``` PROMPT_VERSION=plain ########### DO NOT CHANGE ########### deepspeed train_xformers.py \ --deepspeed ./scripts/zero2.json \ --model_name_or_path ./checkpoints/$MODEL_VERSION \ --version $PROMPT_VERSION \ --data_path ./data/llava_0.1/pretrain_data.json \...

I got it work now, the loss shows: ``` {'loss': 16.2781, 'learning_rate': 3.816793893129771e-06, 'epoch': 0.0} {'loss': 15.7207, 'learning_rate': 7.633587786259541e-06, 'epoch': 0.0} {'loss': 15.9175, 'learning_rate': 1.1450381679389314e-05, 'epoch': 0.0} {'loss': 15.8711, 'learning_rate':...

@LinB203 The 1.5 support very nice! Then you must upgraded to latest tansofmers to support qwen2 tokenizer? How about using transformers's MOE arch to minimal the code!

BTW, did u tried open both vision tower and projector in both stage1 and stage2?

Is qwen1.8b on 1 epoch? with llava pretrain dataset? > BTW, did u tried open both vision tower and projector in both stage1 and stage2? Do u tried this training...