OneTrainer icon indicating copy to clipboard operation
OneTrainer copied to clipboard

[Feat]: Add support for GFX Spark

Open Wayfa opened this issue 1 month ago • 10 comments

What happened?

Theres a chance im doing something wrong, but I suspect its the case. Install.sh fail at installing on a Nvida GFX Spark, at CUDA reqs

What did you expect would happen?

complete install as expected

Relevant log output

Collecting psutil==7.0.0 (from -r requirements-global.txt (line 60))
  Using cached psutil-7.0.0-cp36-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (22 kB)
Collecting requests==2.32.3 (from -r requirements-global.txt (line 61))
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting deepdiff==8.6.1 (from -r requirements-global.txt (line 62))
  Using cached deepdiff-8.6.1-py3-none-any.whl.metadata (8.6 kB)
Collecting torch==2.7.1+cu128 (from -r requirements-cuda.txt (line 3))
  Using cached https://download.pytorch.org/whl/cu128/torch-2.7.1%2Bcu128-cp310-cp310-manylinux_2_28_aarch64.whl.metadata (29 kB)
ERROR: Ignored the following yanked versions: 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.1, 0.2.2, 0.2.2.post2, 0.2.2.post3, 0.15.0
ERROR: Ignored the following versions that require a different python version: 1.16.0 Requires-Python >=3.11; 1.16.0rc1 Requires-Python >=3.11; 1.16.0rc2 Requires-Python >=3.11; 1.16.1 Requires-Python >=3.11; 1.16.2 Requires-Python >=3.11; 1.16.3 Requires-Python >=3.11; 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10; 2.3.0 Requires-Python >=3.11; 2.3.1 Requires-Python >=3.11; 2.3.2 Requires-Python >=3.11; 2.3.3 Requires-Python >=3.11; 2.3.4 Requires-Python >=3.11
ERROR: Could not find a version that satisfies the requirement torchvision==0.22.1+cu128 (from versions: 0.1.6, 0.2.0, 0.11.3, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.17.2, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 0.20.0, 0.20.1, 0.21.0, 0.22.0, 0.22.1, 0.23.0, 0.24.0)
ERROR: No matching distribution found for torchvision==0.22.1+cu128
ERROR conda.cli.main_run:execute(127): `conda run python -m pip install --upgrade --upgrade-strategy eager -r requirements-global.txt -r requirements-cuda.txt` failed. (See above for error)

Generate and upload debug_report.log

No response

Wayfa avatar Nov 01 '25 11:11 Wayfa

We don't have a Spark, so it's not supported. If you can get it to run, please provide the necessary information or a PR so we can support it.

dxqb avatar Nov 01 '25 11:11 dxqb

I dont have enough tech omph for updating OT dependencies myself, tho I can tell Torchvision and onnxruntime were giving issues. I can also try any idea/suggestion if someone has.

Wayfa avatar Nov 01 '25 15:11 Wayfa

you could try this: https://github.com/Nerogar/OneTrainer/pull/1020

dxqb avatar Nov 06 '25 10:11 dxqb

Seems more incompatibilities on the way:

Collecting deepdiff==8.6.1 (from -r requirements-global.txt (line 62)) Using cached deepdiff-8.6.1-py3-none-any.whl.metadata (8.6 kB) ERROR: Ignored the following versions that require a different python version: 1.16.0 Requires-Python >=3.11; 1.16.0rc1 Requires-Python >=3.11; 1.16.0rc2 Requires-Python >=3.11; 1.16.1 Requires-Python >=3.11; 1.16.2 Requires-Python >=3.11; 1.16.3 Requires-Python >=3.11; 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10; 2.3.0 Requires-Python >=3.11; 2.3.1 Requires-Python >=3.11; 2.3.2 Requires-Python >=3.11; 2.3.3 Requires-Python >=3.11; 2.3.4 Requires-Python >=3.11 ERROR: Could not find a version that satisfies the requirement torch==2.8.0+cu128 (from versions: 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.6.0, 2.7.0, 2.7.0+cu128, 2.7.1, 2.7.1+cu128, 2.8.0, 2.9.0, 2.9.0+cu128) ERROR: No matching distribution found for torch==2.8.0+cu128 ERROR conda.cli.main_run:execute(127): conda run python -m pip install --upgrade --upgrade-strategy eager -r requirements-global.txt -r requirements-cuda.txt failed. (See above for error)

Wayfa avatar Nov 10 '25 01:11 Wayfa

We cannot help much here without a spark, but these are basics. You should be able to find on the internet how to install torch for spark. Just from the error message, it seems that torch 2.8 with cuda 12.8 is rejected for some reason. But torch 2.9 with cuda 12.8 seems to be there. maybe spark requires that. It's currently not supported by OneTrainer but only because it's quite new. You can try.

dxqb avatar Nov 10 '25 07:11 dxqb

Ok i managed to make it work, by installing this other pytorch2.9 and some other random dependencies.. it runs tho it looks weird. Ill try today do a sdxl training and ill confirm if it works out.

Wayfa avatar Nov 10 '25 10:11 Wayfa

ok it works.. Ive got couple of really weird trainings but I guess was just the settings. OT can be made to work in GFXSpark with forcing some different versions of Pytorch and other dependencies .

Wayfa avatar Nov 11 '25 10:11 Wayfa

can you post what you did? if you know how to do that, you can also open a pull request. if you don't remember what you've changed, please post the output of

source venv/bin/activate
pip freeze

if you're on linux. if you're on windows you have to adjust the first line

dxqb avatar Nov 11 '25 12:11 dxqb

Okas! but what I did is whatever and this copy of OT is likely rightnow, unstable, beside looking funny. Heres the packs installed. I basically installed whatever would fit according to the env+conda and put CPU support when there wasnt GPU version available.

pip freeze absl-py==2.3.1 accelerate==1.7.0 adv_optm==1.1.3 aiodns==3.5.0 aiohappyeyeballs==2.6.1 aiohttp==3.13.2 aiohttp-retry==2.9.1 aiosignal==1.4.0 annotated-doc==0.0.3 annotated-types==0.7.0 antlr4-python3-runtime==4.9.3 anyio==4.11.0 attrs==25.4.0 av==14.4.0 backoff==2.2.1 backports.zstd==1.0.0 bcrypt==5.0.0 bitsandbytes==0.46.0 boto3==1.40.66 botocore==1.40.66 Brotli==1.1.0 certifi==2025.10.5 cffi==2.0.0 charset-normalizer==3.4.4 click==8.3.0 cloudpickle==3.1.2 colorama==0.4.6 coloredlogs==15.0.1 contourpy==1.3.3 cryptography==45.0.7 customtkinter==5.2.2 cycler==0.12.1 dadaptation==3.2 darkdetect==0.8.0 decorator==5.2.1 deepdiff==8.6.1 Deprecated==1.3.1 -e git+https://github.com/huggingface/diffusers.git@9b721db205729d5a6e97a72312c3a0f4534064f1#egg=diffusers dnspython==2.8.0 email-validator==2.3.0 fabric==3.2.2 fastapi==0.121.0 fastapi-cli==0.0.14 fastapi-cloud-cli==0.3.1 filelock==3.19.1 flatbuffers==25.9.23 fonttools==4.60.1 frozenlist==1.8.0 fsspec==2025.9.0 ftfy==6.3.1 gguf==0.17.1 grpcio==1.76.0 h11==0.16.0 hf-xet==1.2.0 httpcore==1.0.9 httptools==0.7.1 httpx==0.28.1 huggingface-hub==0.34.4 humanfriendly==10.0 idna==3.11 imagesize==1.4.1 importlib_metadata==8.7.0 inquirerpy==0.3.4 invisible-watermark==0.2.0 invoke==2.2.1 itsdangerous==2.2.0 Jinja2==3.1.6 jmespath==1.0.1 kiwisolver==1.4.9 lightning-utilities==0.15.2 lion-pytorch==0.2.3 Markdown==3.10 markdown-it-py==4.0.0 MarkupSafe==2.1.5 matplotlib==3.10.3 mdurl==0.1.2 -e git+https://github.com/Nerogar/mgds.git@50a2394c626c3307f9091b6a1831ae80fe3f2237#egg=mgds mpmath==1.3.0 multidict==6.7.0 networkx==3.5 numpy==2.2.6 nvidia-cublas-cu12==12.8.4.1 nvidia-cuda-cupti-cu12==12.8.90 nvidia-cuda-nvrtc-cu12==12.8.93 nvidia-cuda-runtime-cu12==12.8.90 nvidia-cudnn-cu12==9.10.2.21 nvidia-cufft-cu12==11.3.3.83 nvidia-cufile-cu12==1.13.1.3 nvidia-curand-cu12==10.3.9.90 nvidia-cusolver-cu12==11.7.3.90 nvidia-cusparse-cu12==12.5.8.93 nvidia-cusparselt-cu12==0.7.1 nvidia-ml-py==13.580.82 nvidia-nccl-cu12==2.27.5 nvidia-nvjitlink-cu12==12.8.93 nvidia-nvshmem-cu12==3.3.20 nvidia-nvtx-cu12==12.8.90 omegaconf==2.3.0 -e git+https://github.com/Open-Model-Initiative/OMI-Model-Standards.git@f14b1da606811d2004f9241c3463c240eaf09ac5#egg=omi_model_standards onnxruntime==1.22.0 open_clip_torch==2.32.0 opencv-python==4.11.0.86 orderly-set==5.5.0 orjson==3.11.4 packaging==25.0 paramiko==4.0.0 pfzy==0.3.4 pillow==11.3.0 platformdirs==4.5.0 pooch==1.8.2 prettytable==3.16.0 prodigy-plus-schedule-free==2.0.1 prodigyopt==1.1.2 prompt_toolkit==3.0.52 propcache==0.4.1 protobuf==6.33.0 psutil==7.0.0 py-cpuinfo==9.0.0 pycares==4.11.0 pycparser==2.23 pydantic==2.12.4 pydantic-extra-types==2.10.6 pydantic-settings==2.11.0 pydantic_core==2.41.5 Pygments==2.19.2 PyNaCl==1.6.0 pyparsing==3.2.5 python-dateutil==2.9.0.post0 python-dotenv==1.2.1 python-multipart==0.0.20 pytorch-lightning==2.5.1.post0 pytorch_optimizer==3.6.0 PyWavelets==1.9.0 PyYAML==6.0.2 regex==2025.11.3 requests==2.32.3 rich==14.2.0 rich-toolkit==0.15.1 rignore==0.7.5 runpod==1.7.10 s3transfer==0.14.0 safetensors==0.5.3 scalene==1.5.51 scenedetect==0.6.6 schedulefree==1.4.1 scipy==1.15.3 sentencepiece==0.2.0 sentry-sdk==2.43.0 setuptools==70.2.0 shellingham==1.5.4 six==1.17.0 sniffio==1.3.1 starlette==0.49.3 sympy==1.14.0 tensorboard==2.19.0 tensorboard-data-server==0.7.2 timm==1.0.22 tokenizers==0.21.4 tomli==2.3.0 tomlkit==0.13.3 torch==2.9.0+cu128 torchmetrics==1.8.2 torchvision==0.24.0 tqdm==4.67.1 tqdm-loggable==0.2 transformers==4.52.4 triton==3.5.0 typer==0.20.0 typer-slim==0.20.0 typing-inspection==0.4.2 typing_extensions==4.15.0 ujson==5.11.0 urllib3==2.5.0 uvicorn==0.38.0 uvloop==0.22.1 watchdog==6.0.0 watchfiles==1.1.1 wcwidth==0.2.14 websockets==15.0.1 Werkzeug==3.1.3 wheel==0.45.1 wrapt==2.0.0 yarl==1.22.0 yt-dlp==2025.10.22 zipp==3.23.0

Wayfa avatar Nov 11 '25 21:11 Wayfa

This probably did it:

> torch==2.9.0+cu128
> transformers==4.52.4
> triton==3.5.0

If that's right, this has to wait for upgrade to torch 2.9. It's still too new to upgrade anytime soon - torch is working on a 2.9.1 release.

dxqb avatar Nov 13 '25 18:11 dxqb