Casper comments

Results 293 comments of


                                            Casper

support AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

> https://github.com/qwopqwop200/llm-awq This is the AWQ code that has been changed to save in a format similar to GPTQ. The differences are: > > 1. Some variables have been renamed....

support AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

> FYI: A new quantization technique, SqueezeLLM which seems promising has been released 3 days ago, [github](https://github.com/SqueezeAILab/SqueezeLLM), [paper](https://arxiv.org/pdf/2306.07629.pdf) This looks good after reviewing. The main conclusion is that SqueezeLLM is...

GPTQ support for quantization

Here is the specific line of code in this repository preventing packages like AutoGPTQ to quantize your MPT models: https://github.com/mosaicml/llm-foundry/blob/main/llmfoundry/models/mpt/modeling_mpt.py#L285

GPTQ support for quantization

> output the attention matrices Yes, the algorithm will need attention matrices for quantization. > with flash attention I cannot give you a definitive answer here - the question is;...

GPTQ support for quantization

> Working on the device_map issue as part of #225 Thank you, this should enable better compatibility with huggingface and enable more applications to be built on top of MPT!

GPTQ support for quantization

Hi @abhi-mosaic - development is still ongoing. The recent improvements in foundry have made it much more manageable. A PR is open in [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ/pull/73#issuecomment-1578643013) where we are trying to quantize....

GPTQ support for quantization

Closing this since it seems foundry added the required support for GPTQ to run, although GPTQ has not implemented quantization for MPT models yet.

[ENHANCEMENT] New MPT 30B + CUDA support.

> I'll just add my usual 2c on this subject: I would love if llama.cpp supported all major model types, bringing its hundreds of wonderful features to as many models...

[ENHANCEMENT] New MPT 30B + CUDA support.

> Too much work. Maybe once I get around to writing a binary that runs an exported ggml graph using CUDA (realistically in a few months at the earliest). A...

[ENHANCEMENT] New MPT 30B + CUDA support.

MosaicML (MPT creators) was just acquired by Databricks for $1.3B, so I expect more initiatives for LLMs. Even more of an argument to start supporting their Foundry models. @slaren since...