[Bug]: dependency issue - bitsandbytes - program fails to initialise
What happened?
Running ./start-ui.sh fails with ModuleNotFoundError because the bitsandbytes package is not installed by install.sh
What did you expect would happen?
The program would run successfully.
NOTE: users can fix this issue by running python -m pip install bitsandbytes==0.43.3 alternatively, one of the devs can update the default requirements to include the package.
Relevant log output
(venv) user@XPS-13:OneTrainer$ ./start-ui.sh
conda not found; python version correct; use native python
Traceback (most recent call last):
File "/home/user/Documents/fs-tools/OneTrainer/scripts/train_ui.py", line 5, in <module>
from modules.ui.TrainUI import TrainUI
File "/home/user/Documents/fs-tools/OneTrainer/modules/ui/TrainUI.py", line 9, in <module>
from modules.trainer.GenericTrainer import GenericTrainer
File "/home/user/Documents/fs-tools/OneTrainer/modules/trainer/GenericTrainer.py", line 17, in <module>
from modules.trainer.BaseTrainer import BaseTrainer
File "/home/user/Documents/fs-tools/OneTrainer/modules/trainer/BaseTrainer.py", line 8, in <module>
from modules.util import create
File "/home/user/Documents/fs-tools/OneTrainer/modules/util/create.py", line 6, in <module>
from modules.dataLoader.FluxBaseDataLoader import FluxBaseDataLoader
File "/home/user/Documents/fs-tools/OneTrainer/modules/dataLoader/FluxBaseDataLoader.py", line 6, in <module>
from modules.model.FluxModel import FluxModel
File "/home/user/Documents/fs-tools/OneTrainer/modules/model/FluxModel.py", line 8, in <module>
from modules.module.LoRAModule import LoRAModuleWrapper
File "/home/user/Documents/fs-tools/OneTrainer/modules/module/LoRAModule.py", line 8, in <module>
from modules.util.quantization_util import get_unquantized_weight, get_weight_shape
File "/home/user/Documents/fs-tools/OneTrainer/modules/util/quantization_util.py", line 8, in <module>
import bitsandbytes as bnb
ModuleNotFoundError: No module named 'bitsandbytes'
(venv) user@XPS-13:OneTrainer$
Output of pip freeze
absl-py==2.1.0 accelerate==0.30.1 aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 antlr4-python3-runtime==4.9.3 async-timeout==4.0.3 attrs==24.2.0 certifi==2024.8.30 charset-normalizer==3.3.2 cloudpickle==3.0.0 coloredlogs==15.0.1 contourpy==1.3.0 customtkinter==5.2.2 cycler==0.12.1 dadaptation==3.2 darkdetect==0.8.0 -e git+https://github.com/huggingface/diffusers.git@2ee3215949d8f2d3141c2340d8e4d24ec94b2384#egg=diffusers filelock==3.16.0 flatbuffers==24.3.25 fonttools==4.53.1 frozenlist==1.4.1 fsspec==2024.9.0 ftfy==6.2.3 grpcio==1.66.1 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.8 importlib_metadata==8.4.0 invisible-watermark==0.2.0 Jinja2==3.1.4 kiwisolver==1.4.7 lightning-utilities==0.11.7 lion-pytorch==0.1.4 Markdown==3.7 markdown-it-py==3.0.0 MarkupSafe==2.1.5 matplotlib==3.9.0 mdurl==0.1.2 -e git+https://github.com/Nerogar/mgds.git@85bf18746488a898818c36eca651d24734f87431#egg=mgds mpmath==1.3.0 multidict==6.1.0 networkx==3.3 numpy==1.26.4 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.6.68 nvidia-nvtx-cu12==12.1.105 omegaconf==2.3.0 onnxruntime==1.18.0 open-clip-torch==2.24.0 opencv-python==4.9.0.80 packaging==24.1 pillow==10.3.0 platformdirs==4.3.2 pooch==1.8.1 prodigyopt==1.0 protobuf==4.25.4 psutil==6.0.0 Pygments==2.18.0 pynvml==11.5.0 pyparsing==3.1.4 python-dateutil==2.9.0.post0 pytorch-lightning==2.2.5 pytorch_optimizer==3.0.2 PyWavelets==1.7.0 PyYAML==6.0.1 regex==2024.7.24 requests==2.32.3 rich==13.8.1 safetensors==0.4.3 scalene==1.5.41 schedulefree==1.2.5 scipy==1.13.1 sentencepiece==0.2.0 six==1.16.0 sympy==1.13.2 tensorboard==2.17.0 tensorboard-data-server==0.7.2 timm==1.0.9 tokenizers==0.19.1 torch==2.3.1 torchmetrics==1.4.1 torchvision==0.18.1 tqdm==4.66.4 transformers==4.42.3 triton==2.3.1 typing_extensions==4.12.2 urllib3==2.2.2 wcwidth==0.2.13 Werkzeug==3.0.4 yarl==1.11.1 zipp==3.20.1
What GPU do you have? This might be an issue with the and requirements because bitsandbytes only really works with Nvidia GPUs
No GPU, and running on Ubuntu. It seems that program always tries to import bitsandbytes when initialising, which is an issue if bitsandbytes is only installed by install.sh when it detects a Nvidia GPU.
This means it wont start on mac either because macs because bitsandbytes isnt supported on apple silicon
This means it wont start on mac either because macs because bitsandbytes isnt supported on apple silicon
I know that bitsandbytes is working on Apple M-chip support though. Not sure if it's out yet.
Since this is not a bug but expected behaviour I will be closing. Please open a feature request once Bitsandbytes adds Apple M-chip support 🫡
@O-J1 can you clarify how this is expected behaviour?
Apologies if this wasn't clear in the original issue I raised - the bug is due to the fact that OneTrainer needs bitsandbytes in order to run, however install.sh didn't install bitsandbytes when I ran the script*, which means OneTrainer could not run
*note: I haven't re-read the installation scripts since I raised the issue, so this may have been fixed in one of the recent changes
The bug is due to the fact that OneTrainer needs
bitsandbytesin order to run, howeverinstall.shdidn't installbitsandbyteswhen I ran the script*, which means OneTrainer could not run
Is this still the case? That should be fixed already
@O-J1 can you clarify how this is expected behaviour?
@cchance27
Apologies brain fart on my part - I dont know why but I only had cchance's "Apple CPUs" comment in mind for when I wrote the response. My apologies
As Nero said pretty sure this was fixed but I must note that:
https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1338
https://huggingface.co/docs/bitsandbytes/main/en/installation?backend=Intel+CPU+%2B+GPU&platform=Intel+CPU%2BGPU#multi-backend
Intel CPU (only Intel CPU) is still noted as Alpha, so go into this with a very large grain of salt and your speed is going to be absolutely awful. Ive mucked around with training 2M param models (which is very small) on CPU (yolov8) and it was agonising.
When you get some time can you do a quick check (nuke your repo, reclone and try again), it the issue persists, lets reopen 👍
@Nerogar @O-J1 thanks both for getting back to me, really appreciated :)
I haven't tested if this is still the case (I fixed the issue locally and haven't pulled the repo since)
Also, I didn't realise Intel CPU is still noted as Alpha, but that probably explains how I ran into the problem in the first place, since I'm running on an Intel CPU with integrated graphics. Thanks for the suppying the links 👍
Ah yes, yolov8 on CPU does sound agonising, fortunately I'm not doing anything near as serious. Currently, I just have OneTrainer on a test rig I use for experimenting, so I'm more than happy to wipe my local repo and retry - I'll do so and update here in the next 24hrs
@Nerogar @O-J1 sorry it's been a minute - just updating to let you both know that the issue seems to be fixed in the current version of the repo (looks like it was fixed in this commit)
Had no problems whatsoever when retrying with a fresh clone of OneTrainer :+1:
Exact steps taken enclosed for posterity
- wipe existing local repo
user@XPS-13:~$ rm -rf OneTrainer/
- clone latest version (currently commit d738055) of repo
user@XPS-13:~$ git clone [email protected]:Nerogar/OneTrainer.git
- change into OneTrainer directory and run installation script
user@XPS-13:~$ cd OneTrainer/
user@XPS-13:OneTrainer$ OT_PYTHON_CMD="python3.10" OT_PREFER_VENV="true" ./install.sh
note: optional environment variables set (per instructions in LAUNCH-SCRIPTS.md) because there is no default python on this system and I don't use conda
- run the start script
user@XPS-13:OneTrainer$ OT_PYTHON_CMD="python3.10" OT_PREFER_VENV="true" ./start-ui.sh
Also, compliments to whoever is responsible for the new .sh scripts - very elegant way to handle the launch system and just really well written :smile: