LLaMA-VID icon indicating copy to clipboard operation
LLaMA-VID copied to clipboard

Extract context relevancy

Open IgnacioSan22 opened this issue 10 months ago • 0 comments

Hi, first of all I want to express my congratulations for such a work. The models performs pretty well considering the nature of the tasks.

I want to use the model to create video summaries, for that purpose I think that the best approach would be to determine which parts of the input video have the higher attention or context score. Afterall, the LLM will use that to elaborate the textual summary. I'm struggling to do so. Now, I'm working on the token generation function, but I'm unsure about my code. Could someone bring me some help?

This is my current piece of code: image

IgnacioSan22 avatar Apr 25 '24 13:04 IgnacioSan22