Hessa Alawwad
Hessa Alawwad
Thank you for the great model. I wonder how can I get the multimodat embedding of different inputs like image and its caption usign Imagebind? if I can get that...
Hello, thank you for thegreat work. I have been trying to expermint with the model and see how it works. My question is: Can I use llama3.2 vision to cover...
Hello, So I am trying to embed text using CLIP, I got the error that my text is too long but from the huggingface I see that I can fix...
Hello, I am trying the following code to test sending multiple images: ``` import requests import torch from PIL import Image from transformers import MllamaForConditionalGeneration, AutoProcessor model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct" #...
Hello, I am tryoing to SFT train llama3.2 11B vision instruct model. on a dataset that answer a question on an image using a context (could be more than one...
Hello, I was wondering if I would be able to use the DataCollatorForCompletionOnlyLM to train Llama 3.2 vision model on the generated prompts only? Something like passing a response template...
Hello, I am trying to do the following: ``` from imagebind import data from imagebind.models import imagebind_model from imagebind.models.imagebind_model import ModalityType def getEmbeddingVector(inputs): with torch.no_grad(): embedding = imagebind_model(inputs) for key,...