MiniCPM V2.6 Support
Hey guys, I am trying to use MiniCPMV6 with Outlines using https://huggingface.co/openbmb/MiniCPM-V-2_6
I am using the outlines.models.transformers_vision api to load the model, but I can't find the ModelClass defined anywhere in the transformers codebase. Any idea on what I should use for the model_class arg?
Does AutoModelForCausalLM work? https://huggingface.co/openbmb/MiniCPM-V-2_6/blob/main/config.json#L10
@diptanu
I've monkey patched becuase of the model dosen't use image_sizes,
from outlines.models import TransformersVision
original_generate = TransformersVision.generate
def patched_generate(self, prompts, media, generation_parameters, logits_processor, sampling_parameters):
inputs = self.processor(
text=prompts, images=media, padding=True, return_tensors="pt"
).to(self.model.device)
inputs.pop('image_sizes', None)
generation_kwargs = self._get_generation_kwargs(
prompts,
generation_parameters,
logits_processor,
sampling_parameters,
)
generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)
if isinstance(prompts, str):
generated_ids = generated_ids.squeeze(0)
return self._decode_generation(generated_ids)
TransformersVision.generate = patched_generate
Then I've loaded the model as below.
model = AutoModel.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation='flash_attention_2') # use _attn_implementation='sdpa' to disable flash attention
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
outlines_model = models.TransformersVision(model, tokenizer=tokenizer, processor=processor)
However the model doesn't generate anything. Have you solved the problem?
I've solved the problem with using Logits processor.
model_id = "openbmb/MiniCPM-V-2_6"
model = AutoModel.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation='flash_attention_2')
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
class Event(BaseModel):
event: TrafficEvent
weather: Weather
reasoning_step: List[str] = Field(..., title="The reasoning steps leading to the final conclusion.")
outlines_tokenizer = outlines.models.TransformerTokenizer(tokenizer)
event_logit_processor = outlines.processors.JSONLogitsProcessor(
Event, outlines_tokenizer
)
logits_processor = transformers.LogitsProcessorList([event_logit_processor])
for groups in encoded_frame_groups:
user_text = "(<image>./</image>)\n" * len(groups) + user_input
messages = [
{
"role": "system",
"content": system_prompt,
},
{
"role": "user",
"content": user_text,
}
]
prompts_list = [processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)]
print(prompts_list)
images_list = [groups]
inputs = processor(
prompts_list,
images_list,
max_slice_num=max_slice_num,
use_image_id=False,
return_tensors="pt",
max_lengths=8192
).to(model.device)
generation_config = {
"top_p": 0.1,
"temperature": 0.001,
"do_sample": True,
"repetition_penalty": 1.05
}
inputs.pop('image_sizes', None)
generated_ids = model.generate(
**inputs,
tokenizer=tokenizer,
max_new_tokens=300,
vision_hidden_states=None,
stream=False,
decode_text=True,
logits_processor=logits_processor,
**generation_config
)
print(generated_ids)
Would this code work with the new version
https://github.com/OpenBMB/MiniCPM-o?tab=readme-ov-file#multi-turn-conversation
@elloza I haven't seen how the model.chat works in the new model. If it works the same as the older one, then it should work.
The problem should be solved in v1