MiniGPT-5
MiniGPT-5 copied to clipboard
How do model output interleaved text-image with multimodal input?
Does the model require further finetune? I'm wondering why the playground use a 'for' loop to generate a story