VisionLLM [REQUEST] Code and models please!

Hello! I am urgently asking for the release of the inference code + model. Training would be good too. Incredibly thankful, very interesting project!

May 22 '23 18:05 spacewalkingninja

When will the codes be released?

May 25 '23 03:05 mtjhl

+1. looking forward to the code. intersting project.

May 26 '23 01:05 hjq133

+1. I am looking forward to the codes. It is an awesome work.

May 26 '23 15:05 wzhings

+1

May 30 '23 11:05 wojiaohumaocheng

+1

Jun 06 '23 07:06 karthikyeredla

@czczup can you please enlighten us from the realms of the model and code lands <3

Jun 07 '23 14:06 spacewalkingninja

what training data has been used?? Is it publicly available

Jun 09 '23 06:06 mpragnay

any update?

Jun 29 '23 03:06 autosquid

i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :)

Jul 05 '23 16:07 amygbAI

Hi Bruno .. i would say the objectives are 100% the same. So its better to go with a Microsoft research paper that has code rather than some random copy of it ..obviously the authors dont seem to care much anymore

On Thu, Sep 14, 2023 at 4:58 PM Bruno Ma @.***> wrote:

Hi @amygbAI https://github.com/amygbAI, u mean this paper is totally the same with LLaVA?

i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :)

— Reply to this email directly, view it on GitHub https://github.com/OpenGVLab/VisionLLM/issues/2#issuecomment-1719272640, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.***>

Sep 14 '23 13:09 amygbAI

Hi Bruno .. i would say the objectives are 100% the same. So its better to go with a Microsoft research paper that has code rather than some random copy of it ..obviously the authors dont seem to care much anymore … On Thu, Sep 14, 2023 at 4:58 PM Bruno Ma @.> wrote: Hi @amygbAI https://github.com/amygbAI, u mean this paper is totally the same with LLaVA? i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :) — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.>

Hi, May I know if you can reproduce the results? Do you mean this work uses the same training objective as LLaVA? Thanks.

Sep 19 '23 05:09 GuangxingHan

Hi Mr Han .. no, i havent reproduced the results because i would like to train this on charts / graph data exclusively and im preparing the datasets ..having said that , i believe the training objective is the same , i.e.

ensure that the model can take both images and text as input
perform analysis over both image + textual contexts
provide results of the query in textual format

if you go through the LLaVA paper, this will be amply evident to you

On Tue, Sep 19, 2023 at 11:25 AM Guangxing Han @.***> wrote:

Hi Bruno .. i would say the objectives are 100% the same. So its better to go with a Microsoft research paper that has code rather than some random copy of it ..obviously the authors dont seem to care much anymore … <#m_-2675669682656386325_> On Thu, Sep 14, 2023 at 4:58 PM Bruno Ma @.> wrote: Hi @amygbAI https://github.com/amygbAI https://github.com/amygbAI https://github.com/amygbAI, u mean this paper is totally the same with LLaVA? i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :) — Reply to this email directly, view it on GitHub <#2 (comment) https://github.com/OpenGVLab/VisionLLM/issues/2#issuecomment-1719272640>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.>

Hi, May I know if you can reproduce the results? Do you mean this work uses the same training objective as LLaVA? Thanks.

— Reply to this email directly, view it on GitHub https://github.com/OpenGVLab/VisionLLM/issues/2#issuecomment-1724872503, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSAK2N7RFUJN2OK6773X3EXWJANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.***>

Sep 19 '23 06:09 amygbAI

Hi Mr Han .. no, i havent reproduced the results because i would like to train this on charts / graph data exclusively and im preparing the datasets ..having said that , i believe the training objective is the same , i.e. - ensure that the model can take both images and text as input - perform analysis over both image + textual contexts - provide results of the query in textual format if you go through the LLaVA paper, this will be amply evident to you On Tue, Sep 19, 2023 at 11:25 AM Guangxing Han @.> wrote: … Hi Bruno .. i would say the objectives are 100% the same. So its better to go with a Microsoft research paper that has code rather than some random copy of it ..obviously the authors dont seem to care much anymore … <#m_-2675669682656386325_> On Thu, Sep 14, 2023 at 4:58 PM Bruno Ma @.> wrote: Hi @amygbAI https://github.com/amygbAI https://github.com/amygbAI https://github.com/amygbAI, u mean this paper is totally the same with LLaVA? i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :) — Reply to this email directly, view it on GitHub <#2 (comment) <#2 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.> Hi, May I know if you can reproduce the results? Do you mean this work uses the same training objective as LLaVA? Thanks. — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSAK2N7RFUJN2OK6773X3EXWJANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.**>

Thanks for your reply. Yes, LLaVA works exactly in this way.

Sep 19 '23 06:09 GuangxingHan

@czczup Can you provide a timeline for the release code? Thx!

Sep 27 '23 09:09 becauseofAI

I see that this paper is accepted by NeurIPS 2023, which is held 1 month ago. It's Jan 2024 now. Is the code going to be released?

Jan 12 '24 05:01 shaniaos

+1

Feb 04 '24 14:02 zzchust

waiting for code release.

Mar 01 '24 06:03 annopackage

this is a direct message from the intergalactic open source allegiance: RELEASE THIS MODEL TODAY

On Fri, 1 Mar 2024 at 06:10, annopackage @.***> wrote:

waiting for code release.

— Reply to this email directly, view it on GitHub https://github.com/OpenGVLab/VisionLLM/issues/2#issuecomment-1972573952, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGOEEH5E66AQZKIDJJX25ADYWALVNAVCNFSM6AAAAAAYKZB7RKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZSGU3TGOJVGI . You are receiving this because you authored the thread.Message ID: @.***>

Mar 01 '24 15:03 spacewalkingninja

If needed, everyone may try the GiT repository, a general end-to-end vision transformer, which fully covers the tasks included in visionLLM and can also handle semantic segmentation. The code and pre-trained weights have been fully open-sourced.

"GiT: Towards Generalist Vision Transformer through Universal Language Interface"

May 12 '24 16:05 Haiyang-W