MagicSource comments

Results 1296 comments of


                                            MagicSource

How integrate with hf with minial modification?

@WoosukKwon Can u be more specific? Like I have hf based ``` model = AutoModelForCausalLM.from_pretrained( # base_model_path, torch_dtype=torch.float16, low_cpu_mem_usage=True base_model_path, low_cpu_mem_usage=True, torch_dtype=torch.float16, trust_remote_code=True, load_in_8bit=load_in_8bit, device_map="auto", ) ``` How can I...

CUDA error: out of memory

Got OOM too as well, on 32GB v100 and using 7B llama model, it shouldn't OOM, why?

TGI performance is better than vllm on A800

I think this is not TGI better, but vllm result are some sort miss aligned with huggingface's transformers. Not sure its a bug or a feature, but certainly the result...

Accelerate LLaMA model loading

@AlpinDale Can merge this? Currently model loading are extremly slow

How's the speed droping when length get large compare with vanilla llama?

@CStanKonrad how about compare with long chat, which recently added in vicuna, does there any pros and cons compare with it?

Will these optimization integrate into hf's code?

Hi， does torch.complie works with AWQ? (seems hf already supports AWQ, but quantization way might not same as this repo) How to enable speculative decoding in hf?

Can not run

Do we really need using python3.9 synatax when you can easily make it compatible with python3.8 and python3.7 ?

Hands really like a mess

Consider add keypoints? It can given more accurate informations

预计什么时候会开源吗？

It won't happen, don't waste time to wait. Same like Chinese opensourced repo: https://github.com/kwai/KwaiYii https://github.com/XiaoMi/MiLM-6B It's a special phenomenon called `README opensource`.

Can the developer(s) perhaps give us weekly updates on the status of the project for public transparency on what's happening?

Please do not hope a none existed project opensource, this is just a PR. Don't be too serious.