Hasan Abed Al Kader Hammoud comments

Results 16 comments of


                                            Hasan Abed Al Kader Hammoud

Add Prodigy, SophiaG optimizers

@Kimiko-AI I really think this pull request is worth finishing! Very useful - would love to see how Prodigy would perform on LLM training after I used it before on...

Mergekit-Evolve with vLLM enabled causes error if merge_method is linear

Copy generation_config

+1 Same issue, currently I have a hardcoded line for Llama-3 to get it from hub.

Phi-3 conversation format, example training script and perplexity metric

This LGTM but I was testing it out and there might be an issue with Phi-3 and flash-attention. On 4xA100s node I have a warning is obtained when training Phi-3...

Phi-3 conversation format, example training script and perplexity metric

Btw, my issue here got resolved when I turned off sample packing. Maybe Phi-3 sample packing with Flash-Attention isn't compatible. @brianfitzgerald

Phi-3 conversation format, example training script and perplexity metric

FYI https://github.com/OpenAccess-AI-Collective/axolotl/issues/1683 @winglian @brianfitzgerald

请问Qwen转换出错问题:RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 28 but got size 4 for tensor number 1 in the list.

This might be an issue related to HuggingFace transformers library - I'm having same error in a different setting.

Potential max_input_len Issue/Inconsistency?

@williambarberjr you could probably pass `max_length=8192` in the yml file ``` datasets: - path: williambarberjr/L3_8B_Instruct_MarkdownToSummaryConvert type: chat_template chat_template: llama3 max_length: 8192 field_messages: messages message_field_role: role message_field_content: content roles: user: -...

I get ValueError when trying to run Axoltlt on pretraining dataset

@Ahmedn1 was this ever resolved on your end ? I'm having something similar unless I apply multipack attn.

I get ValueError when trying to run Axoltlt on pretraining dataset

@Ahmedn1 what are you currently using ?