densecap
densecap copied to clipboard
How can I use natural language queries to retrieve the source image?
In your paper,your dense captioning model can support image retrieval using natural language queries, and can localize these queries in retrieved images. How can I do the retrieval work?
We don't have code for that in this repo, but it's relatively simple.
First use the extractFeatures method to get boxes and features for the database images:
https://github.com/jcjohnson/densecap/blob/master/densecap/DenseCapModel.lua#L285
Next run the LanguageModel (https://github.com/jcjohnson/densecap/blob/master/densecap/DenseCapModel.lua#L285) and LanguageModelCriterion (https://github.com/jcjohnson/densecap/blob/master/densecap/DenseCapModel.lua#L120) forward using the features and the query to compute the log-likelihood for the query on the extracted boxes.
Finally use these log-likelihoods to sort all boxes on all images.
@jcjohnson can open-world object detection be done in a similar way?
Also, how do you use the boxes and features from extractFeatures
in self.nets.language_model:forward()
(the function expects image vectors and gt labels) and self.crits.lm_crit:forward()
?
Has anyone got a working demo of this by chance?