OneTrainer [Bug]: dependency issue - bitsandbytes

What happened?

Running ./start-ui.sh fails with ModuleNotFoundError because the bitsandbytes package is not installed by install.sh

What did you expect would happen?

The program would run successfully.

NOTE: users can fix this issue by running python -m pip install bitsandbytes==0.43.3 alternatively, one of the devs can update the default requirements to include the package.

Relevant log output

(venv) user@XPS-13:OneTrainer$ ./start-ui.sh
conda not found; python version correct; use native python
Traceback (most recent call last):
  File "/home/user/Documents/fs-tools/OneTrainer/scripts/train_ui.py", line 5, in <module>
    from modules.ui.TrainUI import TrainUI
  File "/home/user/Documents/fs-tools/OneTrainer/modules/ui/TrainUI.py", line 9, in <module>
    from modules.trainer.GenericTrainer import GenericTrainer
  File "/home/user/Documents/fs-tools/OneTrainer/modules/trainer/GenericTrainer.py", line 17, in <module>
    from modules.trainer.BaseTrainer import BaseTrainer
  File "/home/user/Documents/fs-tools/OneTrainer/modules/trainer/BaseTrainer.py", line 8, in <module>
    from modules.util import create
  File "/home/user/Documents/fs-tools/OneTrainer/modules/util/create.py", line 6, in <module>
    from modules.dataLoader.FluxBaseDataLoader import FluxBaseDataLoader
  File "/home/user/Documents/fs-tools/OneTrainer/modules/dataLoader/FluxBaseDataLoader.py", line 6, in <module>
    from modules.model.FluxModel import FluxModel
  File "/home/user/Documents/fs-tools/OneTrainer/modules/model/FluxModel.py", line 8, in <module>
    from modules.module.LoRAModule import LoRAModuleWrapper
  File "/home/user/Documents/fs-tools/OneTrainer/modules/module/LoRAModule.py", line 8, in <module>
    from modules.util.quantization_util import get_unquantized_weight, get_weight_shape
  File "/home/user/Documents/fs-tools/OneTrainer/modules/util/quantization_util.py", line 8, in <module>
    import bitsandbytes as bnb
ModuleNotFoundError: No module named 'bitsandbytes'
(venv) user@XPS-13:OneTrainer$

Output of `pip freeze`

absl-py==2.1.0 accelerate==0.30.1 aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 antlr4-python3-runtime==4.9.3 async-timeout==4.0.3 attrs==24.2.0 certifi==2024.8.30 charset-normalizer==3.3.2 cloudpickle==3.0.0 coloredlogs==15.0.1 contourpy==1.3.0 customtkinter==5.2.2 cycler==0.12.1 dadaptation==3.2 darkdetect==0.8.0 -e git+https://github.com/huggingface/diffusers.git@2ee3215949d8f2d3141c2340d8e4d24ec94b2384#egg=diffusers filelock==3.16.0 flatbuffers==24.3.25 fonttools==4.53.1 frozenlist==1.4.1 fsspec==2024.9.0 ftfy==6.2.3 grpcio==1.66.1 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.8 importlib_metadata==8.4.0 invisible-watermark==0.2.0 Jinja2==3.1.4 kiwisolver==1.4.7 lightning-utilities==0.11.7 lion-pytorch==0.1.4 Markdown==3.7 markdown-it-py==3.0.0 MarkupSafe==2.1.5 matplotlib==3.9.0 mdurl==0.1.2 -e git+https://github.com/Nerogar/mgds.git@85bf18746488a898818c36eca651d24734f87431#egg=mgds mpmath==1.3.0 multidict==6.1.0 networkx==3.3 numpy==1.26.4 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.6.68 nvidia-nvtx-cu12==12.1.105 omegaconf==2.3.0 onnxruntime==1.18.0 open-clip-torch==2.24.0 opencv-python==4.9.0.80 packaging==24.1 pillow==10.3.0 platformdirs==4.3.2 pooch==1.8.1 prodigyopt==1.0 protobuf==4.25.4 psutil==6.0.0 Pygments==2.18.0 pynvml==11.5.0 pyparsing==3.1.4 python-dateutil==2.9.0.post0 pytorch-lightning==2.2.5 pytorch_optimizer==3.0.2 PyWavelets==1.7.0 PyYAML==6.0.1 regex==2024.7.24 requests==2.32.3 rich==13.8.1 safetensors==0.4.3 scalene==1.5.41 schedulefree==1.2.5 scipy==1.13.1 sentencepiece==0.2.0 six==1.16.0 sympy==1.13.2 tensorboard==2.17.0 tensorboard-data-server==0.7.2 timm==1.0.9 tokenizers==0.19.1 torch==2.3.1 torchmetrics==1.4.1 torchvision==0.18.1 tqdm==4.66.4 transformers==4.42.3 triton==2.3.1 typing_extensions==4.12.2 urllib3==2.2.2 wcwidth==0.2.13 Werkzeug==3.0.4 yarl==1.11.1 zipp==3.20.1

Sep 19 '24 05:09 clayajohnson

What GPU do you have? This might be an issue with the and requirements because bitsandbytes only really works with Nvidia GPUs

Sep 19 '24 12:09 Nerogar

No GPU, and running on Ubuntu. It seems that program always tries to import bitsandbytes when initialising, which is an issue if bitsandbytes is only installed by install.sh when it detects a Nvidia GPU.

Sep 26 '24 07:09 clayajohnson

This means it wont start on mac either because macs because bitsandbytes isnt supported on apple silicon

Oct 12 '24 15:10 cchance27

This means it wont start on mac either because macs because bitsandbytes isnt supported on apple silicon

I know that bitsandbytes is working on Apple M-chip support though. Not sure if it's out yet.

Oct 13 '24 19:10 Arcitec

Since this is not a bug but expected behaviour I will be closing. Please open a feature request once Bitsandbytes adds Apple M-chip support 🫡

Nov 04 '24 08:11 O-J1

@O-J1 can you clarify how this is expected behaviour?

Apologies if this wasn't clear in the original issue I raised - the bug is due to the fact that OneTrainer needs bitsandbytes in order to run, however install.sh didn't install bitsandbytes when I ran the script*, which means OneTrainer could not run

*note: I haven't re-read the installation scripts since I raised the issue, so this may have been fixed in one of the recent changes

Nov 04 '24 09:11 clayajohnson

The bug is due to the fact that OneTrainer needs bitsandbytes in order to run, however install.sh didn't install bitsandbytes when I ran the script*, which means OneTrainer could not run

Is this still the case? That should be fixed already

Nov 04 '24 09:11 Nerogar

@O-J1 can you clarify how this is expected behaviour?

@cchance27

Apologies brain fart on my part - I dont know why but I only had cchance's "Apple CPUs" comment in mind for when I wrote the response. My apologies

As Nero said pretty sure this was fixed but I must note that:

https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1338

https://huggingface.co/docs/bitsandbytes/main/en/installation?backend=Intel+CPU+%2B+GPU&platform=Intel+CPU%2BGPU#multi-backend

Intel CPU (only Intel CPU) is still noted as Alpha, so go into this with a very large grain of salt and your speed is going to be absolutely awful. Ive mucked around with training 2M param models (which is very small) on CPU (yolov8) and it was agonising.

When you get some time can you do a quick check (nuke your repo, reclone and try again), it the issue persists, lets reopen 👍

Nov 04 '24 09:11 O-J1

@Nerogar @O-J1 thanks both for getting back to me, really appreciated :)

I haven't tested if this is still the case (I fixed the issue locally and haven't pulled the repo since)

Also, I didn't realise Intel CPU is still noted as Alpha, but that probably explains how I ran into the problem in the first place, since I'm running on an Intel CPU with integrated graphics. Thanks for the suppying the links 👍

Ah yes, yolov8 on CPU does sound agonising, fortunately I'm not doing anything near as serious. Currently, I just have OneTrainer on a test rig I use for experimenting, so I'm more than happy to wipe my local repo and retry - I'll do so and update here in the next 24hrs

Nov 04 '24 10:11 clayajohnson

@Nerogar @O-J1 sorry it's been a minute - just updating to let you both know that the issue seems to be fixed in the current version of the repo (looks like it was fixed in this commit)

Had no problems whatsoever when retrying with a fresh clone of OneTrainer :+1:

Exact steps taken enclosed for posterity

wipe existing local repo

user@XPS-13:~$ rm -rf OneTrainer/

clone latest version (currently commit d738055) of repo

user@XPS-13:~$ git clone [email protected]:Nerogar/OneTrainer.git

change into OneTrainer directory and run installation script

user@XPS-13:~$ cd OneTrainer/
user@XPS-13:OneTrainer$ OT_PYTHON_CMD="python3.10" OT_PREFER_VENV="true" ./install.sh

note: optional environment variables set (per instructions in LAUNCH-SCRIPTS.md) because there is no default python on this system and I don't use conda

run the start script

user@XPS-13:OneTrainer$ OT_PYTHON_CMD="python3.10" OT_PREFER_VENV="true" ./start-ui.sh

Also, compliments to whoever is responsible for the new .sh scripts - very elegant way to handle the launch system and just really well written :smile:

Nov 23 '24 07:11 clayajohnson

[Bug]: dependency issue - bitsandbytes - program fails to initialise

What happened?

What did you expect would happen?

Relevant log output

Output of pip freeze

Output of `pip freeze`