AmazDeng comments

Results 49 comments of


                                            AmazDeng

[Question] When will LLaVA-NeXT be open sourced?

[Question] preprocess_plain，no question part

I have found the answer. Look at these two sites. https://github.com/haotian-liu/LLaVA/issues/615 https://github.com/haotian-liu/LLaVA/releases/tag/v1.0.1 The author says: Pretraining. We simplified the pretraining prompts by removing additional instructions like Describe the image details,...

[RFC]: Multi-modality Support on vLLM

Can vllm support direct input of inputs_emb now? If so, we can leverage vllm's inference capabilities with minimal changes to the model inference code. Moreover, since model architectures are diverse,...

[Bug] llava, cuda out of memory

> The loading process of vlm is: load vision model -> load llm weight -> allocate kv cache. > > For llava-v1.5-7b, the first two steps will takes up about...

[Bug] llava, cuda out of memory

> Can you try run the code without jupyter or ipython. I only have one card and it's currently running a program, so I can't test it right now. I'll...

[Bug] llava, cuda out of memory

> did you solve this problem? i encounter the exactly the same promblem. I didn't solve this problem; I switched to a different architecture instead.

When will the finetune code open sourced?

Does llava-next-video deploy only focus on first frames?

I'm having a similar problem to you. 1. I deployed sglang , and loaded the llava-next-image model, but sglang can only do a single inference. If I do batch inference,...

Concurrent inference failure of TensorRT 8.6.1 when running open_clip visual model tensorrt engine on GPU A100

> what is your trt infer code ? infer code like this. I didn't set up any multi-process or multi-threaded inference operations in the code.: ``` from openclip_trt.tensorrt_utils import TensorRTModel...

Concurrent inference failure of TensorRT 8.6.1 when running open_clip visual model tensorrt engine on GPU A100

@LeoZDong Does TensorRT support multithreaded inference? Note that it's multithreading, not multiprocessing？