starcoder icon indicating copy to clipboard operation
starcoder copied to clipboard

Issue with running Starcoder Model on Mac M2 with Transformers library in CPU environment

Open code2graph opened this issue 2 years ago • 1 comments

I'm attempting to run the Starcoder model on a Mac M2 with 32GB of memory using the Transformers library in a CPU environment. Despite setting load_in_8bit=True, I'm encountering an error during execution. Below is the relevant code:

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigcode/starcoder"
device = "cpu"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

model = AutoModelForCausalLM.from_pretrained(checkpoint,
                                             device_map="auto",
                                             load_in_8bit=True)
print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs)

print(tokenizer.decode(outputs[0], clean_up_tokenization_spaces=False))

While running the above, I receive the following warning and exception:

Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated.
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8.
Warning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

Error:
ValueError:
                        Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
                        the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
                        these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
                        `device_map` to `from_pretrained`. 

code2graph avatar Aug 30 '23 08:08 code2graph

Hi, I believe you'll need a GPU to quantize your model. If you're using a cpu, you might not want to set load_in_8bit=True. Please refer to this part of the documentation for further details.

ArmelRandy avatar Sep 01 '23 13:09 ArmelRandy