whyiug comments

Repositories
Issues
Comments

Results 14 comments of


                                            whyiug

[Core][VLM] Support image embeddings as input

@ywang96 This is a fantastic feature! I've encountered a tricky problem. I need to perform multiple VQA tasks (same image, different questions) using the same model architecture (e.g., Paligemma). In...

[Core][VLM] Support image embeddings as input

> > How can I build an on-the-fly inference service (like a compatible server or something similar)? At this point, the solution I can think of is having the image...

[Core][VLM] Support image embeddings as input

When I was about to add this feature to qwen2vl. Unfortunately, I've run into some difficulties. For example, I can't just rely on image embedding to generate new prompt_token_ids without...

[Core][VLM] Support image embeddings as input

> > When I was about to add this feature to qwen2vl. Unfortunately, I've run into some difficulties. For example, I can't just rely on image embedding to generate new...