JACKPURCELL
Results
2
comments of
JACKPURCELL
Hey, everyone. You may consider to use the code of this paper https://github.com/JACKPURCELL/AutoRAN-public
Some warning. In document, there is nothing about fsdp dtype.Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3ForCausalLM is torch.float32. You should run training...