OpenAdapt Investigate GLaMM

Investigate GLaMM

Open abrichr opened this issue 1 year ago • 0 comments

Feature request

How can we incorporate https://mbzuai-oryx.github.io/groundingLMM/ ?

Motivation

Grounding Large Multimodal Model (GLaMM) is an end-to-end trained LMM which provides visual grounding capabilities with the flexibility to process both image and region inputs. This enables the new unified task of Grounded Conversation Generation that combines phrase grounding, referring expression segmentation and vision-language conversations. Equipped with the capability for detailed region understanding, pixel-level groundings, and conversational abilities, GLaMM offers a versatile capability to interact with visual inputs provided by the user at multiple granularity levels (objects, object parts, attributes, relationships and holistic scene understanding).

Nov 11 '23 20:11 abrichr

OpenAdapt OpenAdapt copied to clipboard

Investigate GLaMM

Feature request

Motivation

OpenAdapt
OpenAdapt copied to clipboard