Casper issues

Results 81 issues of


                                            Casper

Fixed threaded stream and futures user socket

Fix #1200 and #1174 with Python 3.8.13 The stream does not work anymore with the standard code presented. We have to change the threading approach to run it successfully. Additionally,...

Unhandled exception with UnicornFy

### Version of this library. Using v1.41.0 of unicorn websocket and 0.12.2 of unicorn-fy. ### Solution to Issue cannot be found in the documentation or other Issues and also occurs...

bug

GPTQ support for MPT models

Hi @PanQiWei, I would like to request support for MPT models as they are SOTA with a commercial license. MPT models (Base, Story-Writer, Instruct, Chat): https://huggingface.co/mosaicml/mpt-7b I found an implementation...

Fix convert_dataset_hf.py hanging with excessive num_workers

Background: PyTorch's DataLoader hangs on several machines (locally, VM, colab) because of the `num_workers` argument being excessive. Generally, when using multiple processes, we want to scale with the number of...

GPTQ support for quantization

Hi MosaicML. AutoGPTQ is a package trying to provide support for quantizing various LLMs. However, to do so, a few requirements are needed. Here are a few issues: - MPTForCausalLM...

[ENHANCEMENT] New MPT 30B + CUDA support.

MosaicML released its MPT 30B version today with 8k context, with Apache 2.0 license. ![image](https://github.com/ggerganov/llama.cpp/assets/27340033/fdb8a0aa-e73b-4e7f-9f3b-58a389f7b472) ## Why you should support MPT 30B Let me present my argumentation for why MPT...

MergeKit models does not behave the same as the original model

Hi @cg123, I am the author of [AutoAWQ](https://github.com/casper-hansen/AutoAWQ). After being in contact with TheBloke, it seems there are some issues with models from MergeKit. - Weights are not the same...

Support `quantization_config` argument on HF backend

With AutoAWQ, we can fuse layers causing a 2-3x speedup directly by passing a `quantization_config`. If this argument can be supported, it will be possible to evaluate quantized models at...

help wanted

feature request

good first issue

[FEATURE] Implement Dynamic SplitFuse

Dear vLLM maintainers @WoosukKwon and @zhuohan123 (@Yard1), DeepSpeed has released its serving framework which claims to be faster than vLLM. The main speedup comes from [Dynamic SplitFuse](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen#b-dynamic-splitfuse-) which is a...

enhancement

performance

Mixtral 8x7B full finetune with DS zero3: Assertion error

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports. ### Expected Behavior That the model can start...

bug