NanoCode012

Results 255 comments of NanoCode012

Is the API exactly the same? Just from quick look, I see `dropout` not passed in `v3`. What do we need particular to change on our end, or is it...

Do you know whether FA3 dropped support for any non-hopper arch compared to FA2? I recall FA2 dropped for Turing, and support was still not added back.

> Small script to compile the kernels is seen below. I think this has a lot of potential :) > > ``` > git clone https://github.com/Dao-AILab/flash-attention.git > cd flash-attention/hopper >...

Transformers added FA v3 support upstream. I think we just need to add a change and set the attn_implementation now. Could be extended work after attention refactor is in

Confirmed to work by reported author @Rexhaif https://github.com/axolotl-ai-cloud/axolotl/issues/2693#issuecomment-2894221212

On a 100k dummy tool dataset with 6 turns of short content, the time taken to tokenize went from a (three run) average of 83.3s -> 47.6s (decreased by about...

@michaellin99999 , hey! From my understanding, those scripts should work for any systems as Lambda just provides bare compute. Can you let us know if you still get this issue...

I think this is duplicate of #2610 . Can you give upgrading your deepspeed version a try?

@xiao10ma , I'm not familiar with that stack trace? I believe you may be using custom code and the error is outside of Axolotl?

Hey @xiao10ma , try to keep package versions within the versions listed on the requirements txt . These are the versions we check against. Can you see if upgrading the...