OpenAdapt icon indicating copy to clipboard operation
OpenAdapt copied to clipboard

Investigate GLaMM

Open abrichr opened this issue 1 year ago • 0 comments

Feature request

How can we incorporate https://mbzuai-oryx.github.io/groundingLMM/ ?

Motivation

Grounding Large Multimodal Model (GLaMM) is an end-to-end trained LMM which provides visual grounding capabilities with the flexibility to process both image and region inputs. This enables the new unified task of Grounded Conversation Generation that combines phrase grounding, referring expression segmentation and vision-language conversations. Equipped with the capability for detailed region understanding, pixel-level groundings, and conversational abilities, GLaMM offers a versatile capability to interact with visual inputs provided by the user at multiple granularity levels (objects, object parts, attributes, relationships and holistic scene understanding).

abrichr avatar Nov 11 '23 20:11 abrichr