AlpinDale comments

Results 170 comments of


                                            AlpinDale

[Feature]: Exllamav2 Q4, Q6, and Q8 cache

It's definitely a planned feature. I believe @sgsdxzy wanted to work on it.

Initial fetch for `config.json` ignores `--revision`?

I know what's happening. Will fix soon.

[Bug]: Issue when trying to load a AWQ model with --load-in-4bits for mixtral flavors

Please remove the `--quantization awq` part and try again.

[Bug]: Issue when trying to load a AWQ model with --load-in-4bits for mixtral flavors

As of v0.6.0, --load-in-{4bit,8bit,smooth} args are removed. Please use `-q fp8` instead.

[Feature]: Is there a reason CUDA 6.1 is the minimum? Would CUDA 6.0 on the P100 not work?

It's mostly due to the QuIP# kernels. I'll look into extending support to P100s (we used to support them before) tomorrow.

[Feature]: Is there a reason CUDA 6.1 is the minimum? Would CUDA 6.0 on the P100 not work?

Please check #444. It builds for sm_60, but I haven't tested if it actually runs.

[Feature]: Is there a reason CUDA 6.1 is the minimum? Would CUDA 6.0 on the P100 not work?

@online2311 we forgot to bump the build architectures in the dockerfile, this will be fixed by the next release. If you want to build it yourself, edit the Dockerfile like...

[Usage]: Please provide the environment variable that closes the KoboldAI Lite page.

Sorry I totally missed this! I'll take care of this soon.

[Crash]: Program gets terminated

This seems unrelated to aphrodite. Could be a host/port issue?

[Bug]: WSL Cuda out of Memory when Trying to Load GGUF Model

Try disabling CUDA graphs `--enforce-eager`, should help.