MagicSource
MagicSource
@WoosukKwon Can u be more specific? Like I have hf based ``` model = AutoModelForCausalLM.from_pretrained( # base_model_path, torch_dtype=torch.float16, low_cpu_mem_usage=True base_model_path, low_cpu_mem_usage=True, torch_dtype=torch.float16, trust_remote_code=True, load_in_8bit=load_in_8bit, device_map="auto", ) ``` How can I...
Got OOM too as well, on 32GB v100 and using 7B llama model, it shouldn't OOM, why?
I think this is not TGI better, but vllm result are some sort miss aligned with huggingface's transformers. Not sure its a bug or a feature, but certainly the result...
@AlpinDale Can merge this? Currently model loading are extremly slow
@CStanKonrad how about compare with long chat, which recently added in vicuna, does there any pros and cons compare with it?
Hi, does torch.complie works with AWQ? (seems hf already supports AWQ, but quantization way might not same as this repo) How to enable speculative decoding in hf?
Do we really need using python3.9 synatax when you can easily make it compatible with python3.8 and python3.7 ?
Consider add keypoints? It can given more accurate informations
It won't happen, don't waste time to wait. Same like Chinese opensourced repo: https://github.com/kwai/KwaiYii https://github.com/XiaoMi/MiLM-6B It's a special phenomenon called `README opensource`.
Please do not hope a none existed project opensource, this is just a PR. Don't be too serious.