[Feat]: Add support for GFX Spark
What happened?
Theres a chance im doing something wrong, but I suspect its the case. Install.sh fail at installing on a Nvida GFX Spark, at CUDA reqs
What did you expect would happen?
complete install as expected
Relevant log output
Collecting psutil==7.0.0 (from -r requirements-global.txt (line 60))
Using cached psutil-7.0.0-cp36-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (22 kB)
Collecting requests==2.32.3 (from -r requirements-global.txt (line 61))
Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting deepdiff==8.6.1 (from -r requirements-global.txt (line 62))
Using cached deepdiff-8.6.1-py3-none-any.whl.metadata (8.6 kB)
Collecting torch==2.7.1+cu128 (from -r requirements-cuda.txt (line 3))
Using cached https://download.pytorch.org/whl/cu128/torch-2.7.1%2Bcu128-cp310-cp310-manylinux_2_28_aarch64.whl.metadata (29 kB)
ERROR: Ignored the following yanked versions: 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.1, 0.2.2, 0.2.2.post2, 0.2.2.post3, 0.15.0
ERROR: Ignored the following versions that require a different python version: 1.16.0 Requires-Python >=3.11; 1.16.0rc1 Requires-Python >=3.11; 1.16.0rc2 Requires-Python >=3.11; 1.16.1 Requires-Python >=3.11; 1.16.2 Requires-Python >=3.11; 1.16.3 Requires-Python >=3.11; 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10; 2.3.0 Requires-Python >=3.11; 2.3.1 Requires-Python >=3.11; 2.3.2 Requires-Python >=3.11; 2.3.3 Requires-Python >=3.11; 2.3.4 Requires-Python >=3.11
ERROR: Could not find a version that satisfies the requirement torchvision==0.22.1+cu128 (from versions: 0.1.6, 0.2.0, 0.11.3, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.17.2, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 0.20.0, 0.20.1, 0.21.0, 0.22.0, 0.22.1, 0.23.0, 0.24.0)
ERROR: No matching distribution found for torchvision==0.22.1+cu128
ERROR conda.cli.main_run:execute(127): `conda run python -m pip install --upgrade --upgrade-strategy eager -r requirements-global.txt -r requirements-cuda.txt` failed. (See above for error)
Generate and upload debug_report.log
No response
We don't have a Spark, so it's not supported. If you can get it to run, please provide the necessary information or a PR so we can support it.
I dont have enough tech omph for updating OT dependencies myself, tho I can tell Torchvision and onnxruntime were giving issues. I can also try any idea/suggestion if someone has.
you could try this: https://github.com/Nerogar/OneTrainer/pull/1020
Seems more incompatibilities on the way:
Collecting deepdiff==8.6.1 (from -r requirements-global.txt (line 62))
Using cached deepdiff-8.6.1-py3-none-any.whl.metadata (8.6 kB)
ERROR: Ignored the following versions that require a different python version: 1.16.0 Requires-Python >=3.11; 1.16.0rc1 Requires-Python >=3.11; 1.16.0rc2 Requires-Python >=3.11; 1.16.1 Requires-Python >=3.11; 1.16.2 Requires-Python >=3.11; 1.16.3 Requires-Python >=3.11; 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10; 2.3.0 Requires-Python >=3.11; 2.3.1 Requires-Python >=3.11; 2.3.2 Requires-Python >=3.11; 2.3.3 Requires-Python >=3.11; 2.3.4 Requires-Python >=3.11
ERROR: Could not find a version that satisfies the requirement torch==2.8.0+cu128 (from versions: 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.6.0, 2.7.0, 2.7.0+cu128, 2.7.1, 2.7.1+cu128, 2.8.0, 2.9.0, 2.9.0+cu128)
ERROR: No matching distribution found for torch==2.8.0+cu128
ERROR conda.cli.main_run:execute(127): conda run python -m pip install --upgrade --upgrade-strategy eager -r requirements-global.txt -r requirements-cuda.txt failed. (See above for error)
We cannot help much here without a spark, but these are basics. You should be able to find on the internet how to install torch for spark. Just from the error message, it seems that torch 2.8 with cuda 12.8 is rejected for some reason. But torch 2.9 with cuda 12.8 seems to be there. maybe spark requires that. It's currently not supported by OneTrainer but only because it's quite new. You can try.
Ok i managed to make it work, by installing this other pytorch2.9 and some other random dependencies.. it runs tho it looks weird. Ill try today do a sdxl training and ill confirm if it works out.
ok it works.. Ive got couple of really weird trainings but I guess was just the settings. OT can be made to work in GFXSpark with forcing some different versions of Pytorch and other dependencies .
can you post what you did? if you know how to do that, you can also open a pull request. if you don't remember what you've changed, please post the output of
source venv/bin/activate
pip freeze
if you're on linux. if you're on windows you have to adjust the first line
Okas! but what I did is whatever and this copy of OT is likely rightnow, unstable, beside looking funny. Heres the packs installed. I basically installed whatever would fit according to the env+conda and put CPU support when there wasnt GPU version available.
pip freeze absl-py==2.3.1 accelerate==1.7.0 adv_optm==1.1.3 aiodns==3.5.0 aiohappyeyeballs==2.6.1 aiohttp==3.13.2 aiohttp-retry==2.9.1 aiosignal==1.4.0 annotated-doc==0.0.3 annotated-types==0.7.0 antlr4-python3-runtime==4.9.3 anyio==4.11.0 attrs==25.4.0 av==14.4.0 backoff==2.2.1 backports.zstd==1.0.0 bcrypt==5.0.0 bitsandbytes==0.46.0 boto3==1.40.66 botocore==1.40.66 Brotli==1.1.0 certifi==2025.10.5 cffi==2.0.0 charset-normalizer==3.4.4 click==8.3.0 cloudpickle==3.1.2 colorama==0.4.6 coloredlogs==15.0.1 contourpy==1.3.3 cryptography==45.0.7 customtkinter==5.2.2 cycler==0.12.1 dadaptation==3.2 darkdetect==0.8.0 decorator==5.2.1 deepdiff==8.6.1 Deprecated==1.3.1 -e git+https://github.com/huggingface/diffusers.git@9b721db205729d5a6e97a72312c3a0f4534064f1#egg=diffusers dnspython==2.8.0 email-validator==2.3.0 fabric==3.2.2 fastapi==0.121.0 fastapi-cli==0.0.14 fastapi-cloud-cli==0.3.1 filelock==3.19.1 flatbuffers==25.9.23 fonttools==4.60.1 frozenlist==1.8.0 fsspec==2025.9.0 ftfy==6.3.1 gguf==0.17.1 grpcio==1.76.0 h11==0.16.0 hf-xet==1.2.0 httpcore==1.0.9 httptools==0.7.1 httpx==0.28.1 huggingface-hub==0.34.4 humanfriendly==10.0 idna==3.11 imagesize==1.4.1 importlib_metadata==8.7.0 inquirerpy==0.3.4 invisible-watermark==0.2.0 invoke==2.2.1 itsdangerous==2.2.0 Jinja2==3.1.6 jmespath==1.0.1 kiwisolver==1.4.9 lightning-utilities==0.15.2 lion-pytorch==0.2.3 Markdown==3.10 markdown-it-py==4.0.0 MarkupSafe==2.1.5 matplotlib==3.10.3 mdurl==0.1.2 -e git+https://github.com/Nerogar/mgds.git@50a2394c626c3307f9091b6a1831ae80fe3f2237#egg=mgds mpmath==1.3.0 multidict==6.7.0 networkx==3.5 numpy==2.2.6 nvidia-cublas-cu12==12.8.4.1 nvidia-cuda-cupti-cu12==12.8.90 nvidia-cuda-nvrtc-cu12==12.8.93 nvidia-cuda-runtime-cu12==12.8.90 nvidia-cudnn-cu12==9.10.2.21 nvidia-cufft-cu12==11.3.3.83 nvidia-cufile-cu12==1.13.1.3 nvidia-curand-cu12==10.3.9.90 nvidia-cusolver-cu12==11.7.3.90 nvidia-cusparse-cu12==12.5.8.93 nvidia-cusparselt-cu12==0.7.1 nvidia-ml-py==13.580.82 nvidia-nccl-cu12==2.27.5 nvidia-nvjitlink-cu12==12.8.93 nvidia-nvshmem-cu12==3.3.20 nvidia-nvtx-cu12==12.8.90 omegaconf==2.3.0 -e git+https://github.com/Open-Model-Initiative/OMI-Model-Standards.git@f14b1da606811d2004f9241c3463c240eaf09ac5#egg=omi_model_standards onnxruntime==1.22.0 open_clip_torch==2.32.0 opencv-python==4.11.0.86 orderly-set==5.5.0 orjson==3.11.4 packaging==25.0 paramiko==4.0.0 pfzy==0.3.4 pillow==11.3.0 platformdirs==4.5.0 pooch==1.8.2 prettytable==3.16.0 prodigy-plus-schedule-free==2.0.1 prodigyopt==1.1.2 prompt_toolkit==3.0.52 propcache==0.4.1 protobuf==6.33.0 psutil==7.0.0 py-cpuinfo==9.0.0 pycares==4.11.0 pycparser==2.23 pydantic==2.12.4 pydantic-extra-types==2.10.6 pydantic-settings==2.11.0 pydantic_core==2.41.5 Pygments==2.19.2 PyNaCl==1.6.0 pyparsing==3.2.5 python-dateutil==2.9.0.post0 python-dotenv==1.2.1 python-multipart==0.0.20 pytorch-lightning==2.5.1.post0 pytorch_optimizer==3.6.0 PyWavelets==1.9.0 PyYAML==6.0.2 regex==2025.11.3 requests==2.32.3 rich==14.2.0 rich-toolkit==0.15.1 rignore==0.7.5 runpod==1.7.10 s3transfer==0.14.0 safetensors==0.5.3 scalene==1.5.51 scenedetect==0.6.6 schedulefree==1.4.1 scipy==1.15.3 sentencepiece==0.2.0 sentry-sdk==2.43.0 setuptools==70.2.0 shellingham==1.5.4 six==1.17.0 sniffio==1.3.1 starlette==0.49.3 sympy==1.14.0 tensorboard==2.19.0 tensorboard-data-server==0.7.2 timm==1.0.22 tokenizers==0.21.4 tomli==2.3.0 tomlkit==0.13.3 torch==2.9.0+cu128 torchmetrics==1.8.2 torchvision==0.24.0 tqdm==4.67.1 tqdm-loggable==0.2 transformers==4.52.4 triton==3.5.0 typer==0.20.0 typer-slim==0.20.0 typing-inspection==0.4.2 typing_extensions==4.15.0 ujson==5.11.0 urllib3==2.5.0 uvicorn==0.38.0 uvloop==0.22.1 watchdog==6.0.0 watchfiles==1.1.1 wcwidth==0.2.14 websockets==15.0.1 Werkzeug==3.1.3 wheel==0.45.1 wrapt==2.0.0 yarl==1.22.0 yt-dlp==2025.10.22 zipp==3.23.0
This probably did it:
> torch==2.9.0+cu128
> transformers==4.52.4
> triton==3.5.0
If that's right, this has to wait for upgrade to torch 2.9. It's still too new to upgrade anytime soon - torch is working on a 2.9.1 release.