LLaVA-NeXT
LLaVA-NeXT copied to clipboard
video caption often contains " The image "
we use the LLaVA-NeXT-Video-DPO (34B)
I don't know if it's the same with video, but with previous model and images you could just provide a system prompt, for example I may use something like
prompt = "Describe the image in search engine keyword tags"
prompt_format = "[INST] SYSTEM: You are a professional image captioner, describe images as reduced keyword tags for search engines separated by commas.\nUSER: <image>\n<prompt>[/INST]"
I couldn't run the model. (README.md)
#43