outlines icon indicating copy to clipboard operation
outlines copied to clipboard

MiniCPM V2.6 Support

Open diptanu opened this issue 1 year ago • 5 comments

Hey guys, I am trying to use MiniCPMV6 with Outlines using https://huggingface.co/openbmb/MiniCPM-V-2_6

I am using the outlines.models.transformers_vision api to load the model, but I can't find the ModelClass defined anywhere in the transformers codebase. Any idea on what I should use for the model_class arg?

diptanu avatar Oct 13 '24 22:10 diptanu

Does AutoModelForCausalLM work? https://huggingface.co/openbmb/MiniCPM-V-2_6/blob/main/config.json#L10

lapp0 avatar Oct 13 '24 23:10 lapp0

@diptanu I've monkey patched becuase of the model dosen't use image_sizes,

from outlines.models import TransformersVision

original_generate = TransformersVision.generate


def patched_generate(self, prompts, media, generation_parameters, logits_processor, sampling_parameters):
    inputs = self.processor(
        text=prompts, images=media, padding=True, return_tensors="pt"
    ).to(self.model.device)


    inputs.pop('image_sizes', None)


    generation_kwargs = self._get_generation_kwargs(
        prompts,
        generation_parameters,
        logits_processor,
        sampling_parameters,
    )
    generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)

    if isinstance(prompts, str):
        generated_ids = generated_ids.squeeze(0)

    return self._decode_generation(generated_ids)

TransformersVision.generate = patched_generate

Then I've loaded the model as below.

model = AutoModel.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation='flash_attention_2') # use _attn_implementation='sdpa' to disable flash attention
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

outlines_model = models.TransformersVision(model, tokenizer=tokenizer, processor=processor)

However the model doesn't generate anything. Have you solved the problem?

2U1 avatar Dec 10 '24 00:12 2U1

I've solved the problem with using Logits processor.

model_id = "openbmb/MiniCPM-V-2_6"

model = AutoModel.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation='flash_attention_2') 
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

class Event(BaseModel):
    event: TrafficEvent
    weather: Weather
    reasoning_step: List[str] = Field(..., title="The reasoning steps leading to the final conclusion.")
    
outlines_tokenizer = outlines.models.TransformerTokenizer(tokenizer)
event_logit_processor = outlines.processors.JSONLogitsProcessor(
    Event, outlines_tokenizer
)
logits_processor = transformers.LogitsProcessorList([event_logit_processor])

for groups in encoded_frame_groups:
    user_text = "(<image>./</image>)\n" * len(groups) + user_input

    messages = [
        {
            "role": "system",
            "content": system_prompt,
        },
        {
            "role": "user",
            "content": user_text,
        }
    ]

    prompts_list = [processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)]
    print(prompts_list)
    images_list = [groups]

    inputs = processor(
        prompts_list,
        images_list,
        max_slice_num=max_slice_num,
        use_image_id=False,
        return_tensors="pt",
        max_lengths=8192
    ).to(model.device)

    generation_config = {
        "top_p": 0.1,
        "temperature": 0.001,
        "do_sample": True,
        "repetition_penalty": 1.05
    }

    inputs.pop('image_sizes', None)

    
    generated_ids = model.generate(
        **inputs,
        tokenizer=tokenizer,
        max_new_tokens=300,
        vision_hidden_states=None,
        stream=False,
        decode_text=True,
        logits_processor=logits_processor,
        **generation_config
    )

    print(generated_ids)

2U1 avatar Dec 10 '24 00:12 2U1

Would this code work with the new version

https://github.com/OpenBMB/MiniCPM-o?tab=readme-ov-file#multi-turn-conversation

elloza avatar Jan 14 '25 08:01 elloza

@elloza I haven't seen how the model.chat works in the new model. If it works the same as the older one, then it should work.

2U1 avatar Jan 14 '25 15:01 2U1

The problem should be solved in v1

rlouf avatar Jun 25 '25 20:06 rlouf