Eric Buehler comments

Results 543 comments of


                                            Eric Buehler

Installation Error

@TimDouglas2 tagging to let you know I am closing this.

Implement I-quants (IQ4XS, IQ4NL)

This PR is based on the following reference ggml quantization/dequantization functions: Dequantization: https://github.com/ggml-org/llama.cpp/blob/7a2c913e66353362d7f28d612fd3c9d51a831eda/ggml/src/ggml-quants.c#L2434-L2475 Quantization: https://github.com/ggml-org/llama.cpp/blob/7a2c913e66353362d7f28d612fd3c9d51a831eda/ggml/src/ggml-quants.c#L4562-L4745 Vec dot: https://github.com/ggml-org/llama.cpp/blob/7a2c913e66353362d7f28d612fd3c9d51a831eda/ggml/src/ggml-cpu/ggml-cpu-quants.c#L11670-L12233

Why isn't SDPA supported in cuda within candle?

@Murad-Awad our SDPA impl is specialized for Metal currently, and only in the decode phase where there is no masking. For CUDA, the equivalent would most likely be to use...

CUDA error no device on WSL2 Ubuntu 22.04

Hi @polarathene thanks for the analysis, much appreciated 🫡! I will trigger a new build for v0.6.0 when I release it. I plan for that to be over the weekend,...

Flash Attention V1 support

Hey @Murad-Awad! We have [candle-extensions](https://github.com/huggingface/candle-extensions) now, and you can use the [candle-flash-attn-v1](https://crates.io/crates/candle-flash-attn-v1) crate. The function is a [1:1 drop-in replacement](https://github.com/huggingface/candle-extensions/blob/612d5191f57bdc5b9a77659bc5834853793dc9fd/candle-flash-attn-v1/src/lib.rs#L252) for the v2 implementation here in Candle. Let me know...

[Tracking] FLUX T5 XXL model produces NaN when on CUDA and using F16

Interesting find: F16 fails (produces NaN) on an A100, but not an H100.

CUDA_ERROR_NO_DEVICE "no CUDA-capable device is detected"

@coreylowman sorry for not getting back! I am running this on my GPU and Pytorch can see it (`torch.cuda.is_available() == True`).

CUDA_ERROR_NO_DEVICE "no CUDA-capable device is detected"

I am using `cuda-version-from-build-system` and `dynamic-linking`. How should I try dynamic loading?

CUDA_ERROR_NO_DEVICE "no CUDA-capable device is detected"

Hmm yeah, same error. Current: ``` cudarc = { version = "0.11.5", features = ["std", "cublas", "cublaslt", "curand", "driver", "nvrtc", "f16", "cuda-12020"], default-features=false } ```

Support Conv3D

Looks exciting, I wonder how complicated it would be to develop based on Conv2d?