stable-diffusion-webui-forge icon indicating copy to clipboard operation
stable-diffusion-webui-forge copied to clipboard

Add Python 3.12 support

Open Panchovix opened this issue 4 months ago • 3 comments

Just updated some reqs as it is done on reForge. Tested quickly some XL models and seem to work so far.

Testers are welcome.

Panchovix avatar Jul 27 '25 22:07 Panchovix

reForge has the fix already so it can probably be copied to this PR, but bumping the Pillow version will cause XYZ Plot script to fail due to multiline_textsize function being deprecated. https://github.com/Panchovix/stable-diffusion-webui-reForge/blob/20ddc5f80a7bb2c336f55f4b0ddcb2125495f7d7/modules/images.py#L169

MisterChief95 avatar Jul 30 '25 01:07 MisterChief95

@MisterChief95 nice catch, I didn't remember doing that commit 1+ year ago lol. Did the change now.

Panchovix avatar Jul 31 '25 01:07 Panchovix

In addition to support Python 3.12,

** To squeeze the total processing power from just one PC, we have some routes in some layers, at least as follows **

  1. In addition to Python 3.12, No GIL Python versions, 3.13t, 3.14t, pre-GA 3.15t,...?

  2. Python's Distributed Data Parallel Processing Libraries with different resolutions like Ray, Dask,... just in the same machine with hetero xPUs (dGPU, iGPU, xNPU,...)?

  3. PyTorch's DDP: Distributed Data Parallel and FSDP2: Fully Sharded Data Parallel by torch.multiprocessing, torch.distributed, Monarch, etc...?

  4. GPU driving kernels like CUDA 12.9, 13.0 for NVIDIA different chips are NOT sufficiently squeezing the processing power of the recent chips, only 30% at the worst.

  • In case of NVIDIA chips, Triton language, developed by OpenAI which syntax is PyThon-like, could drive them much faster for the specialized processings inside Forge and SD-based ones. Also, Triton has the autonomous optimizer, resulting in less effort and code, which is similar with Python's libraries and PyTorch's DDP/FSDP2 functionalities above.

  • Samely, Intel has Special tuned fork of PyTorch for its XPUs. Original PyTorch is merging it.

  • AMD provides rocm and others, also the 3rd parties are developing. Currently, Forge has deployed a generic library, not optimized for AMD chips.

** As Intel's answer, Intel OneAPI's libraries, OneMKL, OneDNN, OpenVINO, etc, which have the same APIs of PyTorch, however they are written with C++ to optimize the higher performance. It can combine xPUs made not only by Intel, but also AMD and NVIDIA to work at the same time. **

Is there any plans to deploy additional optimizers?

lcretan avatar Nov 17 '25 09:11 lcretan