VisionLLM icon indicating copy to clipboard operation
VisionLLM copied to clipboard

[REQUEST] Code and models please!

Open spacewalkingninja opened this issue 2 years ago • 19 comments

Hello! I am urgently asking for the release of the inference code + model. Training would be good too. Incredibly thankful, very interesting project!

spacewalkingninja avatar May 22 '23 18:05 spacewalkingninja

When will the codes be released?

mtjhl avatar May 25 '23 03:05 mtjhl

+1. looking forward to the code. intersting project.

hjq133 avatar May 26 '23 01:05 hjq133

+1. I am looking forward to the codes. It is an awesome work.

wzhings avatar May 26 '23 15:05 wzhings

+1

wojiaohumaocheng avatar May 30 '23 11:05 wojiaohumaocheng

+1

karthikyeredla avatar Jun 06 '23 07:06 karthikyeredla

@czczup can you please enlighten us from the realms of the model and code lands <3

spacewalkingninja avatar Jun 07 '23 14:06 spacewalkingninja

what training data has been used?? Is it publicly available

mpragnay avatar Jun 09 '23 06:06 mpragnay

any update?

autosquid avatar Jun 29 '23 03:06 autosquid

i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :)

amygbAI avatar Jul 05 '23 16:07 amygbAI

Hi Bruno .. i would say the objectives are 100% the same. So its better to go with a Microsoft research paper that has code rather than some random copy of it ..obviously the authors dont seem to care much anymore

On Thu, Sep 14, 2023 at 4:58 PM Bruno Ma @.***> wrote:

Hi @amygbAI https://github.com/amygbAI, u mean this paper is totally the same with LLaVA?

i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :)

— Reply to this email directly, view it on GitHub https://github.com/OpenGVLab/VisionLLM/issues/2#issuecomment-1719272640, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.***>

amygbAI avatar Sep 14 '23 13:09 amygbAI

Hi Bruno .. i would say the objectives are 100% the same. So its better to go with a Microsoft research paper that has code rather than some random copy of it ..obviously the authors dont seem to care much anymore On Thu, Sep 14, 2023 at 4:58 PM Bruno Ma @.> wrote: Hi @amygbAI https://github.com/amygbAI, u mean this paper is totally the same with LLaVA? i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :) — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.>

Hi, May I know if you can reproduce the results? Do you mean this work uses the same training objective as LLaVA? Thanks.

GuangxingHan avatar Sep 19 '23 05:09 GuangxingHan

Hi Mr Han .. no, i havent reproduced the results because i would like to train this on charts / graph data exclusively and im preparing the datasets ..having said that , i believe the training objective is the same , i.e.

  • ensure that the model can take both images and text as input
  • perform analysis over both image + textual contexts
  • provide results of the query in textual format

if you go through the LLaVA paper, this will be amply evident to you

On Tue, Sep 19, 2023 at 11:25 AM Guangxing Han @.***> wrote:

Hi Bruno .. i would say the objectives are 100% the same. So its better to go with a Microsoft research paper that has code rather than some random copy of it ..obviously the authors dont seem to care much anymore … <#m_-2675669682656386325_> On Thu, Sep 14, 2023 at 4:58 PM Bruno Ma @.> wrote: Hi @amygbAI https://github.com/amygbAI https://github.com/amygbAI https://github.com/amygbAI, u mean this paper is totally the same with LLaVA? i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :) — Reply to this email directly, view it on GitHub <#2 (comment) https://github.com/OpenGVLab/VisionLLM/issues/2#issuecomment-1719272640>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.>

Hi, May I know if you can reproduce the results? Do you mean this work uses the same training objective as LLaVA? Thanks.

— Reply to this email directly, view it on GitHub https://github.com/OpenGVLab/VisionLLM/issues/2#issuecomment-1724872503, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSAK2N7RFUJN2OK6773X3EXWJANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.***>

amygbAI avatar Sep 19 '23 06:09 amygbAI

Hi Mr Han .. no, i havent reproduced the results because i would like to train this on charts / graph data exclusively and im preparing the datasets ..having said that , i believe the training objective is the same , i.e. - ensure that the model can take both images and text as input - perform analysis over both image + textual contexts - provide results of the query in textual format if you go through the LLaVA paper, this will be amply evident to you On Tue, Sep 19, 2023 at 11:25 AM Guangxing Han @.> wrote: Hi Bruno .. i would say the objectives are 100% the same. So its better to go with a Microsoft research paper that has code rather than some random copy of it ..obviously the authors dont seem to care much anymore … <#m_-2675669682656386325_> On Thu, Sep 14, 2023 at 4:58 PM Bruno Ma @.> wrote: Hi @amygbAI https://github.com/amygbAI https://github.com/amygbAI https://github.com/amygbAI, u mean this paper is totally the same with LLaVA? i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :) — Reply to this email directly, view it on GitHub <#2 (comment) <#2 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.> Hi, May I know if you can reproduce the results? Do you mean this work uses the same training objective as LLaVA? Thanks. — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSAK2N7RFUJN2OK6773X3EXWJANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.**>

Thanks for your reply. Yes, LLaVA works exactly in this way.

GuangxingHan avatar Sep 19 '23 06:09 GuangxingHan

@czczup Can you provide a timeline for the release code? Thx!

becauseofAI avatar Sep 27 '23 09:09 becauseofAI

I see that this paper is accepted by NeurIPS 2023, which is held 1 month ago. It's Jan 2024 now. Is the code going to be released?

shaniaos avatar Jan 12 '24 05:01 shaniaos

+1

zzchust avatar Feb 04 '24 14:02 zzchust

waiting for code release.

annopackage avatar Mar 01 '24 06:03 annopackage

this is a direct message from the intergalactic open source allegiance: RELEASE THIS MODEL TODAY

On Fri, 1 Mar 2024 at 06:10, annopackage @.***> wrote:

waiting for code release.

— Reply to this email directly, view it on GitHub https://github.com/OpenGVLab/VisionLLM/issues/2#issuecomment-1972573952, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGOEEH5E66AQZKIDJJX25ADYWALVNAVCNFSM6AAAAAAYKZB7RKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZSGU3TGOJVGI . You are receiving this because you authored the thread.Message ID: @.***>

spacewalkingninja avatar Mar 01 '24 15:03 spacewalkingninja

If needed, everyone may try the GiT repository, a general end-to-end vision transformer, which fully covers the tasks included in visionLLM and can also handle semantic segmentation. The code and pre-trained weights have been fully open-sourced.

"GiT: Towards Generalist Vision Transformer through Universal Language Interface"

Haiyang-W avatar May 12 '24 16:05 Haiyang-W