CogVLM
CogVLM copied to clipboard
Inquiry on CogVLM's capacity for comprehension
If provided with an image containing people with labelled bounding boxes around their faces (labelled with name, race, sex, and dominant facial emotion), can CogVLM coherently describe the people as well as information about each individual that matches the information provided by the label of each individuals bounding box. For example if someones bounding box is labeled (John, black, man, irritated), will CogVLM be able to provide information as such: "The image contains a man called John, he is black and his dominant emotion shows irritation. Is this something CogVLM can be prompted to do?