DeViSE-zero-shot-classification icon indicating copy to clipboard operation
DeViSE-zero-shot-classification copied to clipboard

Latest Paper related to DeViSE

Open UmarMajeed-Rana opened this issue 6 years ago • 4 comments

Hi Fabio. I read your article on Medium. Due to some reasons I am not able to post response. I enjoyed reading your explanation of Paper. Can you point recent advancements in this space? I see this Paper was published in 2013 but it still looks relevant. Simple and Powerful.

UmarMajeed-Rana avatar Jun 18 '19 22:06 UmarMajeed-Rana

Hi Umar,

thank you for your interest! If you are interested in these kinds of models I suggest you next look at image caption generation: https://cs.stanford.edu/people/karpathy/cvpr2015.pdf The authors "describe neural networks that map words and image regions into a common, multimodal embedding.", which shares some similarities with DeViSE. Better approaches are, however, presented in articles like these (1,2). I'm working on a detailed image caption generation tutorial that should be on Medium within the next few weeks. Best regards, Fabio

fg91 avatar Jun 19 '19 09:06 fg91

Thank you for your response Fabio. Do show and tell and other article you mentioned they also take image and text into same common space? I am interested in doing reverse image search on fashion domain. Searching images from corpus by writing the description.

On Wed, 19 Jun 2019, 2:39 PM Fabio M. Graetz [email protected] wrote:

Hi Umar,

thank you for your interest! If you are interested in these kinds of models I suggest you next look at image caption generation: https://cs.stanford.edu/people/karpathy/cvpr2015.pdf The authors "describe neural networks that map words and image regions into a common, multimodal embedding.", which shares some similarities with DeViSE. Better approaches are, however, presented in articles like these (1 https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Vinyals_Show_and_Tell_2015_CVPR_paper.pdf ,2 https://arxiv.org/pdf/1502.03044.pdf). I'm working on a detailed image caption generation tutorial that should be on Medium within the next few weeks. Best regards, Fabio

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fg91/DeViSE-zero-shot-classification/issues/3?email_source=notifications&email_token=AEUTPOGLGEMRPM6A5QQANYLP3H5G5A5CNFSM4HZEJIJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYBJUXI#issuecomment-503487069, or mute the thread https://github.com/notifications/unsubscribe-auth/AEUTPOFRYI6TQCFKZWTCFC3P3H5G5ANCNFSM4HZEJIJA .

UmarMajeed-Rana avatar Jun 19 '19 22:06 UmarMajeed-Rana

No, the models proposed in the "Show and Tell" and "Show, Attend and Tell" papers directly generate captions using an RNN decoder taking the representations generated by a CNN encoder as input.

However, the last paragraph in the left column in the section "Related Work" in the "Show and Tell" paper (page 2) talks about previous articles doing exactly what you intend to do as far as I understand. Maybe check the paragraph out (the one starting with "A large body of work has addressed the problem of ranking descriptions for a given image..."). Is this what you were looking for?

Best regards,

Fabio

fg91 avatar Jun 21 '19 10:06 fg91

Hi Fabio

Exactly I am looking some thing similar. I am looking for some latest paper in this domain so that I can extend them with focus on fashion domain. I saw many attention based models for image captioning but they are not in common embedding space with text. At Inference Time I will just have text to find the relevant/related image.

On Fri, Jun 21, 2019 at 3:15 PM Fabio M. Graetz [email protected] wrote:

No, the models proposed in the "Show and Tell" and "Show, Attend and Tell" papers directly generate captions using an RNN decoder taking the representations generated by a CNN encoder as input.

However, the last paragraph in the left column in the section "Related Work" in the "Show and Tell" paper (page 2) talks about previous articles doing exactly what you intend to do as far as I understand. Maybe check the paragraph out (the one starting with "A large body of work has addressed the problem of ranking descriptions for a given image..."). Is this what you were looking for?

Best regards,

Fabio

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fg91/DeViSE-zero-shot-classification/issues/3?email_source=notifications&email_token=AEUTPODI2MW23GYQ6I6CFDDP3SS43A5CNFSM4HZEJIJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYICECQ#issuecomment-504373770, or mute the thread https://github.com/notifications/unsubscribe-auth/AEUTPOHAYY7ZYMIBELXYNPLP3SS43ANCNFSM4HZEJIJA .

UmarMajeed-Rana avatar Jun 22 '19 23:06 UmarMajeed-Rana