ServientShao
ServientShao
@alex-me I also tried attn_implementation="eager" when loading the model, I still get this error on a specific input. I am using A100
Hey, I find a workaround here, `import torch` `torch.backends.cuda.enable_mem_efficient_sdp(False)` `torch.backends.cuda.enable_flash_sdp(False)` `torch.backends.cuda.enable_math_sdp(True)` but this may slowdown performance
same A100 torch2.6+cu12.6
> `torch==2.6.0+cu126` im having this im getting the same error, downgrading didnt worked for me ,any more solutions? try `pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124` instead of `pip3 install...
@shimmyshimmer Yes, I tried > gradient checkpointing = unsloth and no difference on context length