Roger Wang comments

Results 132 comments of


                                            Roger Wang

[Core][VLM] Support image embeddings as input

@Isotr0py It looks like the generation of image embedding from pixel values and merging with text embedding is currently tied together under `Phi3HDImageEmbedding`. Could you take a look to decouple...

[Core][VLM] Support image embeddings as input

> @ywang96 Ok, I will decouple them tonight. (Sorry that I don't have bandwidth at daytime) No rush at all, and thank you for helping out!

[Core][VLM] Support image embeddings as input

@DarkLight1337 Please give this PR a first pass - I have updated all vision language models except two: - `Chameleon` (since the model itself is only input embedding based). -...

[Core][VLM] Support image embeddings as input

> The only small change I would make is to add an `assert_never` guard at the end of each `_parse_and_validate_image_input` function to make sure that we have handled all of...

[Core][VLM] Support image embeddings as input

On a side note, I realized supporting image embeddings as input is also not feasible for `Fuyu` due to the image processor adding additional logics with tokenizer. Maybe @Isotr0py has...

[Core][VLM] Support image embeddings as input

@DarkLight1337 This PR is ready for final review. I have added a test with Llava 1.5 and updated the documentation.

[Core][VLM] Support image embeddings as input

Hey @Andcircle! Thanks for reaching out! Yes as you mentioned, what this PR does is to allow image embeddings as input so that users can process image to embeddings separately...

[Core][VLM] Support image embeddings as input

> @ywang96 Thanks for your fast response! Yes, I think #6869 should free us. > > Just to be clarified, #6869 's use case can be much broader =) not...

[Core][VLM] Support image embeddings as input

> > > @ywang96 Thanks for your fast response! Yes, I think #6869 should free us. > > > Just to be clarified, #6869 's use case can be much...

[Core][VLM] Support image embeddings as input

@Isotr0py Hey do you think it makes sense to support image embeddings for Fuyu? (currently we cannot easily do it since the embedding creation is tied to tokenizer) We don't...