candle
candle copied to clipboard
Short context length on Qwen quantized examples.
Running any of the quantized examples so far they all seam to have a 1024 token limit.
cargo run --example quantized-qwen3 --release --features cuda,cudnn -- --which 4b --prompt "1802tokens later..."
Error: shape mismatch on target dim, dst: 1024, src: 1802 + 0