Casper comments

Results 293 comments of


                                            Casper

Vicuna-1.5 Quantized Weights

> RuntimeError: CUDA error: an illegal memory access was encountered Looks like you might be running out of memory. Which GPU are you using to load the model? EDIT: If...

Vicuna-1.5 Quantized Weights

> I tried shifting to CUDA-11.8 but facing the same error. Any insights regarding this would be really appreciated. Thanks I am not sure what your specific issue is. Can...

Vicuna-1.5 Quantized Weights

It looks like you are trying to modify `demo.py` and I can't be sure of exactly what is going on. I have been working on a refactoring of AWQ. Can...

Vicuna-1.5 Quantized Weights

> `demo.py` keeps the history so every consecutive prompt is adding to the maximum context length it can support and finally you will be able to see the error when...

Vicuna-1.5 Quantized Weights

I will investigate this in the future. You should be able to keep same context length without problems, maybe it's just something to do with how the config is being...

I have now investigated what is happening. Huggingface transformers/accelerate is not automatically loading the maximum sequence length into the model, causing some problems. I will aim to solve this in...

Request to recommend a related project LMDeploy

> We have integrated this incredible work into our project [LMDeploy](https://github.com/InternLM/lmdeploy) which completes the LLM deployment toolkit, including compressing, inference, and serving. > > Additionally, by extensively optimizing the W4A16...

Casper

Vicuna-1.5 Quantized Weights

Vicuna-1.5 Quantized Weights

Vicuna-1.5 Quantized Weights

Vicuna-1.5 Quantized Weights

Vicuna-1.5 Quantized Weights

Vicuna-1.5 Quantized Weights

Request to recommend a related project LMDeploy

Create models directory in AWQ with model classes

Create models directory in AWQ with model classes

Question about the speed of tiny-chat