coremltools icon indicating copy to clipboard operation
coremltools copied to clipboard

NameError: name 'MistralCausalLM' is not defined

Open Paramstr opened this issue 1 year ago • 1 comments

❓Question: Am trying to replicate the WWDC24 showcase: Bring your machine learning and AI models to Apple silicon

https://developer.apple.com/videos/play/wwdc2024/10159/

In the walkthrough they show a code example in which they convert a mixtral model. When I try to replicate the same code I get this error.

NameError: name 'MistralCausalLM' is not defined

My script

import torch
from torch import nn
import numpy as np
import coremltools as ct
from transformers import AutoModelForCausalLM, AutoTokenizer

# Define the class for the Stateful Mistral model
class StatefulMistral(torch.nn.Module):
    def __init__(self, modelPath, batchSize=1, contextSize=2048):
        super().__init__()
        self.model = MistralCausalLM.from_pretrained(modelPath)
        
        self.register_buffer("keyCache", torch.zeros(self.model.kvCacheShape))
        self.register_buffer("valueCache", torch.zeros(self.model.kvCacheShape))




    def forward(self, inputIds, causalMask):
        return self.model(inputIds, causalMask, self.keyCache, self.valueCache).logits


torch_model = StatefulMistral("mistralai/Mistral-7B-Instruct-v0.2").eval()

Paramstr avatar Jun 25 '24 01:06 Paramstr

Did you find the solution? I find this exact words on Keras, but this class does not have the from_pretrained method

PabloButron avatar Jun 29 '24 19:06 PabloButron

The sample util code is having some final touches done, and will be released once completed. Thank you for your patience!

junpeiz avatar Jul 02 '24 02:07 junpeiz

You have to import it like this:

from transformers import AutoModelForCausalLM, AutoTokenizer, MistralForCausalLM

But you will quickly encounter even more issues after that (at least I do).

alexeichhorn avatar Jul 09 '24 21:07 alexeichhorn

I'm still struggling too. Also looks like there's a problem with kvCache.

daltheman avatar Jul 11 '24 16:07 daltheman

@junpeiz

The sample util code is having some final touches done, and will be released once completed. Thank you for your patience!

If I'm not mistaken the code you use (demo_utils) is not related to hugging face transformers models. I'm trying to convert using LLMs on huggingface transformers as others above this comment, but from scratch without being too dependent on you example code. I'm encountering issues when reading slices from the key value cache tensors.

(see https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L788)

cache_position = cache_kwargs.get("cache_position")
k_out = self.key_cache[layer_idx]
v_out = self.value_cache[layer_idx]

k_out[:, :, cache_position] = key_states
v_out[:, :, cache_position] = value_states

The assignment is causing issues in

@register_torch_op
def index_put(context, node):

as the cache_position tensor containing a whole slice typically range(0, n) where n < context_size and index_put does not seem to support proper slices (except if it's the full slice). It crashes with failing to concatenate the non scalar cache_position into a index tensor consisting of one scalar per dim.

    add_op(context, node)
  File "coremltools/converters/mil/frontend/torch/ops.py", line 3989, in **index_put**
    begin = mb.concat(values=begin, axis=0)
  File "coremltools/converters/mil/mil/ops/registry.py", line 183, in add_op
    return cls._add_op(op_cls_to_add, **kwargs)
  File "coremltools/converters/mil/mil/builder.py", line 202, in _add_op
    new_op.type_value_inference()
  File "coremltools/converters/mil/mil/operation.py", line 257, in type_value_inference
    output_types = self.type_inference()
  File "coremltools/converters/mil/mil/ops/defs/iOS15/tensor_operation.py", line 1011, in type_inference
    raise ValueError(msg.format(v.name, v.rank, rank))
ValueError: Input squeeze_0 has rank 1 != other inputs rank 0

Here is also a screenshot from the debugger to see the tensor dimensions. Screenshot 2024-07-15 at 15 49 58 is0 and is20 should be the same inferred shapes in the end btw.

Questions: Could you give some pointers on how to model the attention cache in a way that it is compatible with CoreML? How did you conceptually implement cache retrieval and update in your example code?

RobertBiehl avatar Jul 15 '24 13:07 RobertBiehl

Hello, the demo code about converting and running Mistral-7B model is released at https://huggingface.co/blog/mistral-coreml.

Feel free to try it and let us know if you have any questions. Thanks!

junpeiz avatar Jul 22 '24 22:07 junpeiz

Closing the issue. See reply above: https://github.com/apple/coremltools/issues/2256#issuecomment-2243885512.

1duo avatar Jul 22 '24 22:07 1duo

@junpeiz can this run on iphone NPU?

enduringstack avatar Sep 24 '24 13:09 enduringstack

@junpeiz can this run on iphone NPU?

@enduringstack The 7B version was benchmarked on Mac, but feel free to try a smaller variant on the phone. Thanks!

junpeiz avatar Sep 24 '24 15:09 junpeiz