Input template for Transformers vision language models ?
Hi,
I'm trying to constrain the generation of my VLMs using this repo; however i can't figure out the way to personalize the pipeline for handling inputs (query+image). Whereas it is documented as
gemini = models.VertexAI("gemini-pro-vision")
with user():
lm = gemini + "What is this a picture of?" + image("longs_peak.jpg")
with assistant():
lm += gen("answer")
for VertexAI models (here gemini), it is not transposable to Transformers models. Hence:
model = models.Transformers("openbmb/MiniCPM-Llama3-V-2_5")
with user():
lm = model + "What is this a picture of?" + image("longs_peak.jpg")
with assistant():
lm += gen("answer")
results in:
TypeError: MiniCPMV.forward() missing 1 required positional argument: 'data'
while trying with "microsoft/Phi-3-vision-128k-instruct" results in: ValueError: The tokenizer being used is unable to convert a special character in ’•¶∂ƒ˙∆£Ħ爨ൠᅘ∰፨.
(I also tried to manually import the model and the tokenizer and to pass them to the guidance.models call, but it does not change the error).
Is it possible to specify/personalize the pipeline for reading inputs on such models?
Thanks
Hi @vpellegrain -- we're in the process of revamping our support for image inputs, but @nking-1 is looking into this right now :). We should have updates on this front shortly!
Got this error from a non-vision model here.
from guidance import models
model_id = 'THUDM/glm-4-9b-chat'
glm_model = models.Transformers(model_id, device_map='auto', trust_remote_code=True)
The error message is the same as the first post here.
Same here, I got the issue while using "microsoft/Phi-3-medium-4k-instruct"
@dittops, are you trying to use a vision input for Phi-3, or just doing plain text generation? We're still working on multimodal support -- will update here when we have the image function working again :).
@liqul -- Thanks for sharing this with us! Tagging @riedgar-ms who might be able to take a look
Facing the same issue trying for another non vision model.
ValueError: The tokenizer being used is unable to convert a special character in ’•¶∂ƒ˙∆£Ħ爨ൠᅘ∰፨. For models with sentencepiece based tokenizers (e.g. llama, phi-3-mini), installing sentencepiece often fixes this issue (pip install sentencepiece).
Sentencepiece is already installed (restarted jupyter kernel). I am initialising it like this -
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "lmsys/vicuna-7b-v1.5-16k"
hf_token = "<your-hugging-face-token>"
tokenizer = AutoTokenizer.from_pretrained(model_id, token=hf_token)
model = AutoModelForCausalLM.from_pretrained(model_id, token=hf_token, device_map="auto", load_in_4bit=True)
from guidance.models import Transformers as GuidanceTransformers
guided_lm = GuidanceTransformers(model=model, tokenizer=tokenizer, echo=False)
Here are some of the relevant install python packages
transformer==4.42.4
accelerate==0.32.1
pyarrow==16.1.0
guidance==0.1.15
bitsandbytes==0.43.1
sentencepiece==0.2.0
Same issue on multiple text only models: Qwen/Qwen2-7B-Instruct, Mihaiii/Pallas-0.5, google/gemma-2-9b-it .
The only one I managed to run is meta-llama/Meta-Llama-3.1-8B-Instruct