guidance icon indicating copy to clipboard operation
guidance copied to clipboard

Input template for Transformers vision language models ?

Open vpellegrain opened this issue 1 year ago • 6 comments

Hi,

I'm trying to constrain the generation of my VLMs using this repo; however i can't figure out the way to personalize the pipeline for handling inputs (query+image). Whereas it is documented as

gemini = models.VertexAI("gemini-pro-vision")

with user():
    lm = gemini + "What is this a picture of?" + image("longs_peak.jpg")

with assistant():
    lm += gen("answer")

for VertexAI models (here gemini), it is not transposable to Transformers models. Hence:

model = models.Transformers("openbmb/MiniCPM-Llama3-V-2_5")

with user():
    lm = model + "What is this a picture of?" + image("longs_peak.jpg")

with assistant():
    lm += gen("answer")

results in:

TypeError: MiniCPMV.forward() missing 1 required positional argument: 'data'

while trying with "microsoft/Phi-3-vision-128k-instruct" results in: ValueError: The tokenizer being used is unable to convert a special character in ’•¶∂ƒ˙∆£Ħ爨ൠᅘ∰፨.

(I also tried to manually import the model and the tokenizer and to pass them to the guidance.models call, but it does not change the error).

Is it possible to specify/personalize the pipeline for reading inputs on such models?

Thanks

vpellegrain avatar Jun 04 '24 16:06 vpellegrain

Hi @vpellegrain -- we're in the process of revamping our support for image inputs, but @nking-1 is looking into this right now :). We should have updates on this front shortly!

Harsha-Nori avatar Jun 04 '24 16:06 Harsha-Nori

Got this error from a non-vision model here.

from guidance import models

model_id = 'THUDM/glm-4-9b-chat'
glm_model = models.Transformers(model_id, device_map='auto', trust_remote_code=True)

The error message is the same as the first post here.

liqul avatar Jun 12 '24 05:06 liqul

Same here, I got the issue while using "microsoft/Phi-3-medium-4k-instruct"

dittops avatar Jun 12 '24 13:06 dittops

@dittops, are you trying to use a vision input for Phi-3, or just doing plain text generation? We're still working on multimodal support -- will update here when we have the image function working again :).

@liqul -- Thanks for sharing this with us! Tagging @riedgar-ms who might be able to take a look

Harsha-Nori avatar Jun 12 '24 14:06 Harsha-Nori

Facing the same issue trying for another non vision model. ValueError: The tokenizer being used is unable to convert a special character in ’•¶∂ƒ˙∆£Ħ爨ൠᅘ∰፨. For models with sentencepiece based tokenizers (e.g. llama, phi-3-mini), installing sentencepiece often fixes this issue (pip install sentencepiece).

Sentencepiece is already installed (restarted jupyter kernel). I am initialising it like this -

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "lmsys/vicuna-7b-v1.5-16k"
hf_token = "<your-hugging-face-token>"

tokenizer = AutoTokenizer.from_pretrained(model_id, token=hf_token)
model = AutoModelForCausalLM.from_pretrained(model_id, token=hf_token, device_map="auto", load_in_4bit=True)

from guidance.models import Transformers as GuidanceTransformers
guided_lm = GuidanceTransformers(model=model, tokenizer=tokenizer, echo=False)

Here are some of the relevant install python packages

transformer==4.42.4
accelerate==0.32.1
pyarrow==16.1.0
guidance==0.1.15
bitsandbytes==0.43.1
sentencepiece==0.2.0

yash98 avatar Aug 04 '24 11:08 yash98

Same issue on multiple text only models: Qwen/Qwen2-7B-Instruct, Mihaiii/Pallas-0.5, google/gemma-2-9b-it .

The only one I managed to run is meta-llama/Meta-Llama-3.1-8B-Instruct

Mihaiii avatar Aug 17 '24 18:08 Mihaiii