bitsandbytes
bitsandbytes copied to clipboard
[RFC] Cross-Platform Refactor: Overview + Link Hub
Disclaimer: This document is dynamic and will be updated to reflect the evolving consensus and decisions made throughout our discussions in each RFC issue.
Central Hub for Cross-Platform Enhancements
Welcome to the meta RFC (Request for Comments) designed to navigate the needed enhancements for enabling cross-platform compatibility within bitsandbytes
. This thread is intended to be the nucleus of all RFC discussions, interlinking various topics and proposals.
Our mission is to consolidate efforts from across our community, establish clear objectives, agree on optimal strategies, and coordinate contributions to ensure a seamless transition to full cross-platform support.
We would like an approach were the individual algorithms of the backend can be gradually or even only partially implemented by whomever is willing to put in the work.
The core areas identified by the BNB maintainers for community input and collaboration include:
Testing and CI/CD Infrastructure
To facilitate community contributions and ensure rapid integration, we're focusing on establishing a robust CI/CD framework. This system should support all platforms, providing immediate and actionable feedback on submitted PRs. Essential to this effort is addressing and resolving existing issues like flaky tests.
- Detailed Discussion: [RFC] Cross-Platform Refactor: Testing and CI/CD Strategy #1031
Build Process and Distribution
Adapting our build and distribution processes to accommodate various platforms is crucial. We're exploring efficient strategies, including CMake and GitHub Actions, for building binaries. Additionally, we need to tackle challenges related to package hosting, size constraints, and the distribution of binary wheels.
- In-Depth Conversation: [RFC] Cross-Platform Refactor: Build System and Binary Distribution #1032
- related: [RFC] Cross-Platform Refactor: CPU-only implementation #1021
Setup
There's been ongoing discussion around improving bitsandbytes/cuda_setup
and it plays a role in the Intel + Windows related PRs. So we need to make sure that we align how this module goes on in the light of the other topics.
Another topic is that we get many issues that are confused to be the same error, but aren't. We need to make sure that we get actionable issues when people have issues that they get notified through this module (e.g. introduce error codes, improve issue template).
- In-Depth Conversation: #918
Intel CPU + GPU backend
This RFC proposes extending the bitsandbytes
library to support Intel CPUs and GPUs. The approach includes introducing a device abstraction layer to simplify adding non-CUDA devices and leveraging the PyTorch 2.x compiler stack alongside Intel Extension for PyTorch (IPEX) for lightweight integration.
This enables 8-bit and 4-bit precision features on Intel platforms without the need for native backend code, reducing complexity and maintenance. Key performance functions will utilize IPEX, while others will be optimized using PyTorch's compilation technology. The plan involves phased PRs to implement these features, alongside proposed changes to the Transformers library to expand bitsandbytes API usage across multiple devices.
steps (as PRs):
- #898 (important ongoing discussion about main abstraction for supporting other backend)
- jianan-gu#3
- jianan-gu#4
(Since the unmerged PRs have dependencies to each other, some of those PRs are in other repos, but they will be rebased when everything is ready.)
AMD
PRs #756 AMD semi-official fork issues https://github.com/nixified-ai/flake/issues/56
Please @arlo-phoenix or @fxmarty create an RFC issue of the same format as the Apple one to centralize discussions, as well as decisions and tracking of open work.
I only skimmed through #898, but from what I see the idea is to add the ability to have different backends with one of them being the current implementation now under a CudaBackend. From my perspective this won't really change this PR that much then (only gotta move some checks) since there isn't really a need for a separate backend for HIP and AMD GPU's should just use the CudaBackend as well.
- One improvement could be moving the defines to a separate header hip-compat.h so it's better separated.
- The Makefile definitely still needs work, as already said never worked with them directly
- If there is a move towards a CMakeFile for Windows Support (I think there are several PR's) I could try to make this work with CMake. Should be easier to add good integration that doesn't bother Cuda compilation as I'm more experienced with that
Apple Silicon
Summary and coordination of the ongoing efforts can be found and should be discussed here: [RFC] Cross-Platform Refactor: Mac M1 support.
issue #252 #485
Windows
@wkpark: please provide overview given in https://github.com/TimDettmers/bitsandbytes/discussions/990#discussioncomment-8314733 in a dedicated RFC issue akin to the one Apple Silicon one above.
Community Contributions and Engagement
We believe in the power of community-driven development and encourage contributions from everyone. Your insights, code snippets, and solution proposals are super important for making this a reality.
The RFCs are for really goal directed technical discussions. But don't hold back and let us know what you think, even if it's unstructured or your not sure about it:
- For brainstorming and informal discussions: Join our community forum.
Please @arlo-phoenix or @fxmarty create an RFC issue of the same format as the Apple one to centralize discussions, as well as decisions and tracking of open work.
I'll write the RFC after the device abstraction is merged since most stuff going forward for HIP depends on that.
Edit: AMD themselves will open a PR and there won't be a RFC