Casper

Results 295 comments of Casper

> Frankly this is a very puzzling issue. No as far as I know there should not be discrepancies between Windows and Linux provided that you use the same Poppler...

> Not possible. This would require an extra parameter, and we have nearly too many of those already, so I'm somewhat disinclined, more leaning to rule-of-thumb/ballpark estimations in that case....

This seriously looks good. Is RTN used for the kv-cache quantization?

You need to use float16 or half for quantization.

Did you try upgrading to the latest vLLM?

> @LaaZa: This PR needs a lot of work. For MPT to work with this repo, it'll require changes in `MPTForCausalLM` class. I have opened [an issue](https://huggingface.co/mosaicml/mpt-7b/discussions/30) on MosaicML, let...

@LaaZa @abhinavkulkarni MosaicML has now merged changes that allows outputting attention and the use of device map. Maybe this PR will be easier now?

That would be amazing @PanQiWei. I have yet to see much support for MPT despite their models being the best open-sourced foundational models. It would be a significant step to...

> Thank you for sharing those insights! It would help me a lot when I develop to support MPT model! Hi @PanQiWei, how is it going with the efforts to...

I successfully quantized the MPT model with AutoGPTQ. Posted the model here: https://huggingface.co/casperhansen/mpt-7b-8k-chat-gptq Prompt: Write an e-mail to Sam Altman ``` Dear Sam Altman, I hope this email finds you...