Qwen-VL
Qwen-VL copied to clipboard
Extracting Unimodal Features
Hello! I am trying to use Qwen-VL to extract unimodal features for a given input image and accompanying text query. How can that be achieved? I am aware that models like BLIP-2 have a direct API (extract_features) that aids in doing this. But how can it be achieved in the context of Qwen-VL?