[Bug]: Double the ram usage after upgrading to Torch 2.8
What happened?
After updating to the latest commit I got OOM, trying to train Lora for Qwen Image bf16 with 1.0 cpu offload. During caching model loaded as usual, with 80gb of ram used. Then, once actual steps started, ram usage went up, before crashing OneTrainer. I tried manually reverting to the commit that worked before, but got OOM again. After downgrading Torch to 2.7.1 the issue disappeared.
What did you expect would happen?
Not getting OOM with the same config that worked before
Relevant log output
Starting UI...
C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\default.py:30: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
Fetching 16 files: 100%|█████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 1173.66it/s]
TensorFlow installation not found - running with reduced feature set.
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.19.0 at http://localhost:6006/ (Press CTRL+C to quit)
The config attributes {'pooled_projection_dim': 768} were passed to QwenImageTransformer2DModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Selected layers: 720
Deselected layers: 126
Note: Enable Debug mode to see the full list of layer names
Exception in thread Reloader:
Traceback (most recent call last):
File "C:\Users\ulexe\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "C:\Users\ulexe\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\backend\event_processing\data_ingester.py", line 108, in _reload
self._multiplexer.Reload()
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\backend\event_processing\plugin_event_multiplexer.py", line 263, in Reload
Worker()
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\backend\event_processing\plugin_event_multiplexer.py", line 241, in Worker
accumulator.Reload()
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\backend\event_processing\plugin_event_accumulator.py", line 202, in Reload
for event in self._generator.Load():
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\backend\event_processing\directory_watcher.py", line 88, in Load
for event in self._LoadInternal():
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\backend\event_processing\directory_watcher.py", line 118, in _LoadInternal
for event in self._loader.Load():
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\backend\event_processing\event_file_loader.py", line 270, in Load
for event in super().Load():
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\backend\event_processing\event_file_loader.py", line 244, in Load
for record in super().Load():
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\backend\event_processing\event_file_loader.py", line 178, in Load
yield next(self._iterator)
^^^^^^^^^^^^^^^^^^^^
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\backend\event_processing\event_file_loader.py", line 109, in __next__
self._reader.GetNext()
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\compat\tensorflow_stub\pywrap_tensorflow.py", line 207, in GetNext
header_str = self._read(8)
^^^^^^^^^^^^^
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\compat\tensorflow_stub\pywrap_tensorflow.py", line 273, in _read
new_data = self.file_handle.read(n)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\compat\tensorflow_stub\io\gfile.py", line 736, in read
(self.buff, self.continuation_token) = self.fs.read(
^^^^^^^^^^^^^
File "C:\AI\OneTrainer\OneTrainer\venv\Lib\site-packages\tensorboard\compat\tensorflow_stub\io\gfile.py", line 141, in read
data = f.read(size)
^^^^^^^^^^^^
MemoryError
Generate and upload debug_report.log
No response
cannot reproduce. 34 GB RAM during caching, 38 GB ram during training, using the qwen 16 GB full finetuning preset but offloading fraction 1.0 as you have described. please try to reproduce using the preset. if it doens't happen then, please post your config.
I was getting this issue with Lora training, not finetune. Though the same happens on Qwen 16 GB full finetuning default preset with offload set to 1.0.
Torch 2.7.1 - training works fine
Torch 2.8 - during "running model setup" OneTrainer crashes
Only Torch and Torchvision versions were different, tested with the same commit and config.
cannot reproduce. 34 GB RAM during caching, 38 GB ram during training, using the qwen 16 GB full finetuning preset but offloading fraction 1.0 as you have described. please try to reproduce using the preset. if it doens't happen then, please post your config.
Above images are from default preset
Above images are from default preset
you've mentioned you are using bf16 and offloading fraction of 1.0 that is both not the preset. can you just post you config please
Sure config.json
Above testing was done with "#qwen Finetune 16GB", which has bfloat16 for prior and float8 for text encoder by default, with offloading changed to 1.0. OOM happens with both presets
Please try with the actual unmodified preset instead, so that we have a control. It only needs to be for a single epoch.
Please try with the actual unmodified preset instead, so that we have a control. It only needs to be for a single epoch.
I did. I got OOM with "#qwen Finetune 16GB", unmodified except for changing cpu offload from 0.75 to 1.0. It crashed before caching, during "running model setup", with Torch 2.8. Same preset worked fine with Torch 2.7.1, I let it run for 5 steps before stopping.
Sure config.json
Above testing was done with "#qwen Finetune 16GB", which has bfloat16 for prior and float8 for text encoder by default, with offloading changed to 1.0. OOM happens with both presets
with your config, about 67 GB system RAM+swap during caching, about 81 GB system RAM+swap during training (total OS values, not only OneTrainer) At your settings, this is not surprising: with both the transformer and the TE at bf16, the raw model size 56 GB.
If you have a "crash" please elaborate what that means and post an error log. your RAM wasn't even full. if you see a difference between torch 2.7.1 and torch 2.8.0, can you please try to reproduce this at more reasonable settings? I have 64 GB ram and prefer not to do multiple tests at these settings. If you can show that the preset uses more RAM at torch 2.8 I'll look into it.
I realized it may not be clear from the picture, but where it's flat my PC is frozen for a while and then Windows kills OneTrainer because it probably requested more ram than I have ram+swap
Error log was posted in the very first message, it's same for all crashes.
I will try Lora training at float8 then
2.8 model loading
2.7.1 model loading
2.8 training
2.7.1 training
Lora training with transformer and text encoder at float8 17gb more used during training on Torch 2.8
cannot reproduce. 39 GB during training, never above 40 GB during loading. that's total OS values again. No discernable difference between torch 2.8 and torch 2.7.1
If this is an issue introduced by torch 2.8, it must be Windows-only or in some other way specific to your system.
Using the default profile -> no issues, both Windows and Linux. There is no VRAM usage increase with torch2.8. Just a small bump speed in training.