whyiug
whyiug
@ywang96 This is a fantastic feature! I've encountered a tricky problem. I need to perform multiple VQA tasks (same image, different questions) using the same model architecture (e.g., Paligemma). In...
> > How can I build an on-the-fly inference service (like a compatible server or something similar)? At this point, the solution I can think of is having the image...
When I was about to add this feature to qwen2vl. Unfortunately, I've run into some difficulties. For example, I can't just rely on image embedding to generate new prompt_token_ids without...
> > When I was about to add this feature to qwen2vl. Unfortunately, I've run into some difficulties. For example, I can't just rely on image embedding to generate new...