LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

Computing output likelihoods with the InstructBLIP model

Open vishaal27 opened this issue 1 year ago • 2 comments

Hi, is it possible to get the tokenwise log-likelihood scores of different outputs from the InstructBLIP model?

The use-case would be something like: Given an interleaved image/text input and a list of output text candidates, we should be able to get a score for each output candidate and then return their ranked list, rather than generating the outputs directly. This would be close to how LLMs are evaluated on MCQ tasks. An example from the T0 paper Page 6 (https://arxiv.org/pdf/2110.08207.pdf):

For tasks that involve choosing the correct completion from several options (e.g. multiple choice
question answering), we follow Brown et al. (2020) and use rank classification to evaluate our
model: we compute the log-likelihood of each of the target options under the fine-tuned model and
select the option with the highest log-likelihood as the prediction. For simplicity, we do not apply
length normalization to the log-likelihoods of the target options.

Is it straightforward to do this with InstructBLIP? I assume since the LLM is based on Vicuna (built with transformers) there should be a possibility to use output score functions already implemented (haven't dug into this yet)?

vishaal27 avatar May 12 '23 09:05 vishaal27