Volica.X comments

Results 5 comments of


                                            Volica.X

ticks

your vps is so stable!

[Core] generate from input embeds

Here is a bug: vllm/inputs/preprocess.py line333 will set prompt_token_ids=[] but vllm/engine/llm_engine.py line1721 is: if prompt_ids is None ``` Traceback (most recent call last): File "/mnt/bn/integrated-risk-model2/LLM_Inference_Service/llmserver_diy/llmserver/core/vanilla_vllm/vanilla_vllm_scheduler.py", line 42, in inner async...

[Core] generate from input embeds

And another bug I do not find reason yet Using InternLM2 model Input: ``` prompt_embeds is a torch.Tensor, it's torch.Size([1777, 4096]) SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0,...

[Core] generate from input embeds

> Are you using speculative decoding? It's not supported with input embeds yet. No, here is my config ``` AsyncEngineArgs(model='./internlm2-chat-7b', served_model_name=None, tokenizer='./internlm2-chat-7b', skip_tokenizer_init=False, tokenizer_mode='auto', trust_remote_code=True, download_dir=None, load_format='auto', config_format='auto', dtype='auto', kv_cache_dtype='auto',...

[Core] generate from input embeds

I tried the newest pr using cmd 'gh pr checkout 6869', but seems failed: ``` [rank0]: Traceback (most recent call last): [rank0]: File "internlm2/embed_vllm.py.py", line 45, in [rank0]: File "vllm-master/vllm/vllm/utils.py",...