densecap icon indicating copy to clipboard operation
densecap copied to clipboard

How can I use natural language queries to retrieve the source image?

Open 664852049 opened this issue 8 years ago • 3 comments

In your paper,your dense captioning model can support image retrieval using natural language queries, and can localize these queries in retrieved images. How can I do the retrieval work?

664852049 avatar Jul 11 '16 07:07 664852049

We don't have code for that in this repo, but it's relatively simple.

First use the extractFeatures method to get boxes and features for the database images:

https://github.com/jcjohnson/densecap/blob/master/densecap/DenseCapModel.lua#L285

Next run the LanguageModel (https://github.com/jcjohnson/densecap/blob/master/densecap/DenseCapModel.lua#L285) and LanguageModelCriterion (https://github.com/jcjohnson/densecap/blob/master/densecap/DenseCapModel.lua#L120) forward using the features and the query to compute the log-likelihood for the query on the extracted boxes.

Finally use these log-likelihoods to sort all boxes on all images.

jcjohnson avatar Jul 11 '16 16:07 jcjohnson

@jcjohnson can open-world object detection be done in a similar way? Also, how do you use the boxes and features from extractFeatures in self.nets.language_model:forward() (the function expects image vectors and gt labels) and self.crits.lm_crit:forward()?

MohitShridhar avatar Jul 16 '16 09:07 MohitShridhar

Has anyone got a working demo of this by chance?

brannondorsey avatar May 08 '17 19:05 brannondorsey