What are current limitations of QDoRA in unsloth? I can't get it to work with FA2. It seems to work without FA2 but on very low ctx. Should FA2 with QDoRA be supported by current version unsloth or not?
Here's a traceback of FA2 fail in case you want to take a look.
```
Unsloth cannot patch MLP layers with our manual autograd engine since either LoRA adapters
are not enabled or a bias term (like in Qwen) is used.
Unsloth cannot patch Attention layers with our manual autograd engine since either LoRA adapters
are not enabled or a bias term (like in Qwen) is used.
Unsloth cannot patch O projection layer with our manual autograd engine since either LoRA adapters
are not enabled or a bias term (like in Qwen) is used.
Unsloth 2024.4 patched 60 layers with 0 QKV layers, 0 O layers and 0 MLP layers.
trainable params: 249,630,720 || all params: 34,638,547,968 || trainable%: 0.7206731651413778
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\ /| Num examples = 107,714 | Num Epochs = 1
O^O/ _/ \ Batch size per device = 1 | Gradient Accumulation steps = 8
\ / Total batch size = 8 | Total steps = 13,464
"-____-" Number of trainable parameters = 249,630,720
0%| | 0/13464 [00:00<?, ?it/s]Traceback (most recent call last):
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth3/configs/yi-34b-xlctx-aezakmi-sft-2104-dora.py", line 125, in
sft_trainer.train()
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train
output = super().train(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "", line 361, in _fast_inner_training_loop
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model(**inputs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/accelerate/utils/operations.py", line 825, in forward
return model_forward(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/accelerate/utils/operations.py", line 813, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/unsloth/models/llama.py", line 882, in PeftModelForCausalLM_fast_forward
return self.base_model(
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/unsloth/models/llama.py", line 813, in _CausalLM_fast_forward
outputs = self.model(
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/unsloth/models/llama.py", line 680, in LlamaModel_fast_forward
layer_outputs = decoder_layer(
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/unsloth/models/llama.py", line 433, in LlamaDecoderLayer_fast_forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/unsloth/models/llama.py", line 359, in LlamaAttention_fast_forward
A = flash_attn_func(Q, K, V, causal = True)
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 831, in flash_attn_func
return FlashAttnFunc.apply(
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 511, in forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_forward(
File "/media/adamo/82142F79142F6EFB/ProgramData/Anaconda3/envs/unsloth4/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 51, in _flash_attn_forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.fwd(
RuntimeError: FlashAttention only support fp16 and bf16 data type
0%| | 0/13464 [00:00<?, ?it/s]