bitsandbytes Support for Apple silicon

Would it make sense for this library to support platforms other than cuda on x64 Linux? I am specifically looking for Apple silicon support. Currently not even cpuonly works since it assumes SSE2 support (Even without Neon. Support).

i would guess that the first step would be a full cross platform compile (arm64), then ideally support for Metal Performance Shaders as an alternative to CUDA (assuming it is at all feasible).

I could probably contribute some towards support if there is interest for bitsandbytes to be multi platform. I have some experience setting up cross platform Python libraries.

Apr 01 '23 22:04 rickardp

Hi there, I will contribute too, in order to get it to work on Metal Apple M1

this is my trace:

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
dlopen([/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so), 0x0006): tried: '[/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so)' (not a mach-o file), '[/System/Volumes/Preboot/Cryptexes/OS/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/System/Volumes/Preboot/Cryptexes/OS/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so)' (no such file), '[/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so)' (not a mach-o file)
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
dlopen([/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so), 0x0006): tried: '[/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so)' (not a mach-o file), '[/System/Volumes/Preboot/Cryptexes/OS/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/System/Volumes/Preboot/Cryptexes/OS/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so)' (no such file), '[/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so)' (not a mach-o file)
[/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/cextension.py:31](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/cextension.py:31): UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
--------------------------------------------------------------------------------------------
# What version of Python do you have?
import sys
import platform
import torch

has_gpu = torch.cuda.is_available()
has_mps = getattr(torch,'has_mps',False)
print('has_mps', has_mps)
device = "mps" if getattr(torch,'has_mps',False) \
    else "gpu" if torch.cuda.is_available() else "cpu"

print(f"Python Platform: {platform.platform()}")
print(f"PyTorch Version: {torch.__version__}")
print()
print(f"Python {sys.version}")
print("GPU is", "available" if has_gpu else "NOT AVAILABLE")
print("MPS (Apple Metal) is", "AVAILABLE" if has_mps else "NOT AVAILABLE")
print(f"Target device is {device}")
----------------------------------------------------------------------------------
has_mps True
Python Platform: macOS-13.3-arm64-arm-64bit
PyTorch Version: 2.0.0

Python 3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 21:38:11) 
[Clang 14.0.6 ]
GPU is NOT AVAILABLE
MPS (Apple Metal) is AVAILABLE
Target device is mps

Apr 02 '23 23:04 TheStoneMX

Nice to hear! It would be good to hear from the maintainers that they are at all interested in making this package cross-platform. It is very much CUDA focused at the moment.

Getting libbitsandbytes_cpu.so to compile for macOS arm64 was not at all difficult, just an exercise in moving around some #ifdefs, but CPU support would obviously need to add Neon (SIMD) to make any sense IMHO. Then, of course the MPS support would be needed at one point (though I expect it's quite a lot more work).

I've just started looking at the unit tests and the Python libraries.

The C++ code is quite nicely structured, but the Python code would need some refactoring since most of the calls assume CUDA (x.cuda() instead of x.to(device), etc). Also, since the CPU version does not cover 100% of the feature set, testing is going to be quite some work as there is no real baseline. I suppose one question is if it would make sense to make the CPU cover 100% of the API calls, even if inefficient, just to provide a baseline that the GPU implementations could compare against?

If pursuing this, I propose implementing cross-platform CPU support first, then tackling MPS. MPS is of course what makes it useful.

(I have the exact same setup BTW, 2021 MBP)

Edit: Specifically, here's how I imagine the unit tests would have to work https://github.com/TimDettmers/bitsandbytes/pull/257/files#diff-659bad232c71219167252c1a5ccbc427b6f54925b78741df18613c3c49aaa4c1R153

So at least one CPU test pass on my M1 Mac :)

Apr 03 '23 21:04 rickardp

please have a look at Building on Jetson AGX Xavier Development Kit fails #221 It addresses the same AArch64 issue but on CUDA supported platforms like NVidia Jetson.

Apr 05 '23 13:04 janrinze

Wow .. not to be inflammatory , but are we saying that there's no immediate solution for this if you have any macbook in the last like .. 5 years? Yuck.

Jun 24 '23 17:06 UserHIJ

https://en.wikipedia.org/wiki/Apple_M1 introduced less than 3 years ago. Things take time in the world of open-source. Specially when using hardware such as Apple.

Jun 25 '23 23:06 janrinze

when will this be done?

Aug 06 '23 08:08 KotlinFactory

Would it make sense for this library to support platforms other than cuda on x64 Linux? I am specifically looking for Apple silicon support. Currently not even cpuonly works since it assumes SSE2 support (Even without Neon. Support).

i would guess that the first step would be a full cross platform compile (arm64), then ideally support for Metal Performance Shaders as an alternative to CUDA (assuming it is at all feasible).

I could probably contribute some towards support if there is interest for bitsandbytes to be multi platform. I have some experience setting up cross platform Python libraries.

Looking forward to the support for this too, got the below errors when I tried to fine-tune llama2 7B with load_in_8bit=True enabled on my Macbook M2, PyTorch‘s support to MPS is getting better and I hope this project could support this as well:

  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 293, in forward
    using_igemmlt = supports_igemmlt(A.device) and not state.force_no_igemmlt
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 226, in supports_igemmlt
    if torch.cuda.get_device_capability(device=device) < (7, 5):
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/cuda/__init__.py", line 381, in get_device_capability
    prop = get_device_properties(device)
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/cuda/__init__.py", line 395, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Aug 21 '23 08:08 benjaminhuo

@benjaminhuo Getting the same issue as you.

Sep 08 '23 13:09 AlexandreCassagne

  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 293, in forward
    using_igemmlt = supports_igemmlt(A.device) and not state.force_no_igemmlt
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 226, in supports_igemmlt
    if torch.cuda.get_device_capability(device=device) < (7, 5):
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/cuda/__init__.py", line 381, in get_device_capability
    prop = get_device_properties(device)
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/cuda/__init__.py", line 395, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

https://github.com/TimDettmers/bitsandbytes/blob/18e827d666fa2b70a12d539ccedc17aa51b2c97c/bitsandbytes/autograd/_functions.py#L227

This seems to be due to calling torch.cuda even if the device type isn't cuda. One way to patch these unchecked torch.cuda calls is adding device checks like

if device.type != 'cuda':
    return False

mps returns "mps" as device.type

Sep 17 '23 04:09 id4thomas

same issue here, MPS seems to be the problem

Nov 27 '23 16:11 pechaut78

getting same issue with apple silicon. would love to see some support for it soon!

Dec 02 '23 16:12 ProjectProgramAMark

Same issue. Would be nice to have support for MPS.

Dec 24 '23 09:12 ivan-digital

Same here, please have support for MPS https://github.com/ml-explore/mlx

Dec 26 '23 21:12 ageorgios

(torch-gpu) I542464@DY4GPKX1J0 test % python3 fine_tune_llama_2_in_google_colab.py /Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " 'NoneType' object has no attribute 'cadam32bit_grad_fp32' Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:32<00:00, 16.06s/it] /Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( /Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:159: UserWarning: You didn't pass a max_seq_lengthargument to the SFTTrainer, this will default to 1024 warnings.warn( 0%| | 0/250 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using thecallmethod is faster than using a method to encode the text followed by a call to thepadmethod to get a padded encoding.use_cache=Trueis incompatible with gradient checkpointing. Settinguse_cache=False... /Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( FP4 quantization state not initialized. Please call .cuda() or .to(device) on the LinearFP4 layer first. Traceback (most recent call last): File "/Users/I542464/test/fine_tune_llama_2_in_google_colab.py", line 229, in <module> trainer.train() File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( ^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/trainer.py", line 2654, in training_step loss = self.compute_loss(model, inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/trainer.py", line 2679, in compute_loss outputs = model(**inputs) ^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/peft/peft_model.py", line 922, in forward return self.base_model( ^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward outputs = self.model( ^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 685, in forward layer_outputs = torch.utils.checkpoint.checkpoint( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/_dynamo/external_utils.py", line 17, in inner return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint return CheckpointFunction.apply(function, preserve, *args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(*args, **kwargs) # type: ignore[misc] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 230, in forward outputs = run_function(*args) ^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward return module(*inputs, output_attentions, None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward query_states = self.q_proj(hidden_states) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/peft/tuners/lora.py", line 1123, in forward result = super().forward(x) ^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/bitsandbytes/nn/modules.py", line 221, in forward out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py", line 567, in matmul_4bit assert quant_state is not None ^^^^^^^^^^^^^^^^^^^^^^^ AssertionError 0%| | 0/250 [00:01<?, ?it/s]

Jan 23 '24 06:01 592319702

+1 MPS support would be absolutely great!

Feb 15 '24 07:02 mbtre

adding a comment to keep this alive. MPS support would be awesome!

Feb 27 '24 13:02 morkapronczay

Once the device abstraction has been been merged, we can start adding MPS-accelerated versions of the functions

Feb 27 '24 15:02 rickardp

Once the device abstraction has been been merged, we can start adding MPS-accelerated versions of the functions

Yay. Thanks to all your efforts. One a side note: how can someone be skilled enough to contribute to this stuff? Like what topics should they cover?

Feb 27 '24 15:02 Satyam7166-tech

Looking forward to MPS support!

Mar 01 '24 16:03 sislam-provenir

Looking forward to MPS Support!!!!

Mar 10 '24 21:03 anilkul98

looking forward to mps support

Mar 13 '24 05:03 JohnSilverman

Looking forward to mps support!

Mar 17 '24 21:03 Allisterlim

Please support MPS.

Mar 19 '24 12:03 svnv-svsv-jm

Please support MPS. Looking forward to it.

Mar 21 '24 14:03 ashwinrachha786

Hey everyone, we're committed to enabling Apple Silicon support. There's a lot of ongoing work to get out of the way to lay the groundwork for this.

We'll keep you posted. Thanks for your interest and support of BNB 🤗

Mar 21 '24 15:03 Titus-von-Koeller

For the time being, for those on Apple Silicon, who wants to get unblocked asap: you can use MLX to run HuggingFace models locally with GPU and shared memory architecture support.

The mlx-examples repo is a good place to start as it contains:

scripts that download models from HuggingFace and converts the weight tensors to MLX format.
integration between MLX Model and HuggingFace tokenizers using AutoTokenizer.from_pretrained
Supports model fine-tuning with LoRA and quantization (QLoRA)

Mar 21 '24 15:03 sislam-provenir

bitsandbytes bitsandbytes copied to clipboard

Support for Apple silicon

bitsandbytes
bitsandbytes copied to clipboard