AmazDeng

Results 49 comments of AmazDeng

I have found the answer. Look at these two sites. https://github.com/haotian-liu/LLaVA/issues/615 https://github.com/haotian-liu/LLaVA/releases/tag/v1.0.1 The author says: Pretraining. We simplified the pretraining prompts by removing additional instructions like Describe the image details,...

Can vllm support direct input of inputs_emb now? If so, we can leverage vllm's inference capabilities with minimal changes to the model inference code. Moreover, since model architectures are diverse,...

> The loading process of vlm is: load vision model -> load llm weight -> allocate kv cache. > > For llava-v1.5-7b, the first two steps will takes up about...

> Can you try run the code without jupyter or ipython. I only have one card and it's currently running a program, so I can't test it right now. I'll...

> did you solve this problem? i encounter the exactly the same promblem. I didn't solve this problem; I switched to a different architecture instead.

I'm having a similar problem to you. 1. I deployed sglang , and loaded the llava-next-image model, but sglang can only do a single inference. If I do batch inference,...

> what is your trt infer code ? infer code like this. I didn't set up any multi-process or multi-threaded inference operations in the code.: ``` from openclip_trt.tensorrt_utils import TensorRTModel...

@LeoZDong Does TensorRT support multithreaded inference? Note that it's multithreading, not multiprocessing?