weiran-work
weiran-work
@di-jabil Thanks for the summary! Great effort on root-causing. There are two possible reasons I can think of 1) platform dependent behavior, you are exporting on MacOs and compile on...
thanks @NanoCode012 Here's an example llama-factory config we used. ``` ### model model_name_or_path: meta-llama/Llama-3.1-8B trust_remote_code: true ### method stage: sft do_train: true finetuning_type: full # deepspeed: examples/deepspeed/ds_z2_config.json # choices: [ds_z0_config.json,...
@xzuyn That's a typo when I cleaning up the Axolotl config. Sorry about that. I was using a local path with llama3.1-8b model with Axolotl. So both experiments were using...
@ved1beta Thanks for trying > do i need to register the chat template manually You can add the template to the tokenizer_config.json > dataset.json file can you also share the...
@winglian We used 8xH100, effective batch size is 128
Based on our latest experiments, one contributing factor is FSDP1 vs FSDP2. We suspect that this is from upstream HF's implementation. We'll just stay away from FSDP2 for now. FSDP1...
Yes, it is with Axolotl This is the FSDP1 config ``` base_model: meta-llama/Llama-3.1-8B plugins: - axolotl.integrations.liger.LigerPlugin liger_rope: true liger_rms_norm: true liger_glu_activation: true liger_layer_norm: true liger_fused_linear_cross_entropy: true chat_template: llama3 datasets: -...
@winglian llama-factory don't support fsdp2 official when we tested it a few weeks ago.
@SalmanMohammadi Thank you for root causing this! We'll give that branch a try end of this week or early next week. And I personally think exposing that upcasting as an...