VisionLLM icon indicating copy to clipboard operation
VisionLLM copied to clipboard

VisionLLM Series

Results 15 VisionLLM issues
Sort by recently updated
recently updated
newest added

May I know, when will you release your code or the full details of your paper?

Hello! I am urgently asking for the release of the inference code + model. Training would be good too. Incredibly thankful, very interesting project!

Hi, your work is great! But I am confused about the location tokens you used in Decoder, could you provide more details it?

I think that you push below token in llm ``` ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ...] ``` about object detection...

Are segmentation outputs (coordinates) directly predicted from network as floating point numbers under next token prediction loss? This part is quite unclear in the paper. Or are they regressed (using...

What is the training time of the whole model?

An issue is found in recurrence. Location tokens, {,... , , ... , }. It is used when tokenizer decodes, where the LLM comes out with some offset coordinates relative...

Thanks for your awesome work! VisionLLM opens a way towards a generalist vision and language model. However, from the result in the single task vs. multiple tasks in ablation study,...

Hello, Thanks for your wonderful work VisionLLM v2 and I'm so interested in your paper. I wonder when will the model checkpoint be released. It will be so grateful if...

Thanks for your wonderful work! May I ask for a detailed list of hundreds of public vision and vision-language tasks mentioned in the v2 paper?