NanoCode012 comments

Results 255 comments of


                                            NanoCode012

FlashAttention V3

Is the API exactly the same? Just from quick look, I see `dropout` not passed in `v3`. What do we need particular to change on our end, or is it...

FlashAttention V3

Do you know whether FA3 dropped support for any non-hopper arch compared to FA2? I recall FA2 dropped for Turing, and support was still not added back.

FlashAttention V3

> Small script to compile the kernels is seen below. I think this has a lot of potential :) > > ``` > git clone https://github.com/Dao-AILab/flash-attention.git > cd flash-attention/hopper >...

FlashAttention V3

Transformers added FA v3 support upstream. I think we just need to add a change and set the attn_implementation now. Could be extended work after attention refactor is in

fix: plugin rl overwriting trainer_cls

Confirmed to work by reported author @Rexhaif https://github.com/axolotl-ai-cloud/axolotl/issues/2693#issuecomment-2894221212

feat: do not find turn indices if turn is not trainable

On a 100k dummy tool dataset with 6 turns of short content, the time taken to tokenize went from a (three run) average of 83.3s -> 47.6s (decreased by about...

Using two 8xH100 nodes to train. encounter error bf16 requested, but AMP is not supported on this GPU. Requires Ampere series or above.

@michaellin99999 , hey! From my understanding, those scripts should work for any systems as Lambda just provides bare compute. Can you let us know if you still get this issue...

NanoCode012

FlashAttention V3

FlashAttention V3

FlashAttention V3

FlashAttention V3

fix: plugin rl overwriting trainer_cls

feat: do not find turn indices if turn is not trainable

Using two 8xH100 nodes to train. encounter error bf16 requested, but AMP is not supported on this GPU. Requires Ampere series or above.

Unable to resume Deepspeed Zero 1 training

Unable to resume Deepspeed Zero 1 training

Unable to resume Deepspeed Zero 1 training