sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

"The paging file is too small for this operation to complete."

Open UBBK opened this issue 3 years ago • 13 comments

This is the error i get while trying to train with the script. Is my 1070ti with 8GB Vram the issue or did i mess up applying the script or a dependency issue?

steps:   0%|                                                                                    | 0/25 [00:00<?, ?it/s]epoch 1/1
Traceback (most recent call last):
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 62, in <module>
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: DLL load failed while importing _pywrap_tensorflow_internal: The paging file is too small for this operation to complete.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\...\sd-scripts\lora_train_popup.py", line 8, in <module>
    import train_network
  File "C:\Users\...\sd-scripts\train_network.py", line 9, in <module>
    from accelerate.utils import set_seed
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\accelerate\__init__.py", line 7, in <module>
    from .accelerator import Accelerator
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\accelerate\accelerator.py", line 27, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\accelerate\checkpointing.py", line 24, in <module>
    from .utils import (
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\accelerate\utils\__init__.py", line 103, in <module>
    from .megatron_lm import (
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\accelerate\utils\megatron_lm.py", line 32, in <module>
    from transformers.modeling_outputs import (
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\transformers\__init__.py", line 30, in <module>
    from . import dependency_versions_check
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\transformers\dependency_versions_check.py", line 17, in <module>
    from .utils.versions import require_version, require_version_core
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\transformers\utils\__init__.py", line 34, in <module>
    from .generic import (
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\transformers\utils\generic.py", line 33, in <module>
    import tensorflow as tf
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\tensorflow\__init__.py", line 37, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\tensorflow\python\__init__.py", line 36, in <module>
    from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 77, in <module>
    raise ImportError(
ImportError: Traceback (most recent call last):
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 62, in <module>
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: DLL load failed while importing _pywrap_tensorflow_internal: The paging file is too small for this operation to complete.


Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors for some common causes and solutions.
If you need help, create an issue at https://github.com/tensorflow/tensorflow/issues and include the entire stack trace above this error message.

UBBK avatar Jan 13 '23 08:01 UBBK

Try increasing your pagefile size:

https://mcci.com/support/guides/how-to-change-the-windows-pagefile-size/

brucethemoose avatar Jan 13 '23 16:01 brucethemoose

Didnt help, still getting paging file error even after doubling pagefile size

  CUDA SETUP: Loading binary C:\Users\...\sd-scripts\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
use 8-bit Adam optimizer
running training / 学習開始
  num train images * repeats / 学習画像の数×繰り返し回数: 5
  num reg images / 正則化画像の数: 0
  num batches per epoch / 1epochのバッチ数: 5
  num epochs / epoch数: 1
  batch size per device / バッチサイズ: 1
  total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ(並列学習、勾配合計含む): 1
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 5
steps:   0%|                                                                                     | 0/5 [00:00<?, ?it/s]epoch 1/1
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\...\sd-scripts\lora_train_popup.py", line 8, in <module>
    import train_network
  File "C:\Users\...\sd-scripts\train_network.py", line 8, in <module>
    import torch
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\torch\__init__.py", line 129, in <module>
    raise err
OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\Users\...\sd-scripts\venv\lib\site-packages\torch\lib\cusolver64_11.dll" or one of its dependencies.

UBBK avatar Jan 13 '23 17:01 UBBK

Increased page file size to the max amount i can and now i am getting a different error.

================================================================================
CUDA SETUP: Loading binary C:\Users\...\sd-scripts\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
use 8-bit Adam optimizer
running training / 学習開始
  num train images * repeats / 学習画像の数×繰り返し回数: 5
  num reg images / 正則化画像の数: 0
  num batches per epoch / 1epochのバッチ数: 5
  num epochs / epoch数: 1
  batch size per device / バッチサイズ: 1
  total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ(並列学習、勾配合計含む): 1
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 5
steps:   0%|                                                                                     | 0/5 [00:00<?, ?it/s]epoch 1/1
Error no kernel image is available for execution on the device at line 89 in file D:\ai\tool\bitsandbytes\csrc\ops.cu
Traceback (most recent call last):
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\...\sd-scripts\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\...\\sd-scripts\\venv\\Scripts\\python.exe', 'lora_train_popup.py']' returned non-zero exit status 1.

UBBK avatar Jan 13 '23 19:01 UBBK

Ah yeah that is a bitsandbytes library error, its is really finicky about detecting and using your CUDA install. You might want to look through the issues here: https://github.com/TimDettmers/bitsandbytes

Or just disable the 8 bit optimizer?

I had to add 2 environment variables to get it working in linux, but I never got that error specifically.

brucethemoose avatar Jan 13 '23 20:01 brucethemoose

Wasnt 8bit optimizer for bad GPUs? Mine is a 1070ti with 8GB VRAM so don't think i can run this without 8bit optimizer?

UBBK avatar Jan 13 '23 20:01 UBBK

This comment fixes the bitsandbytes issue: https://github.com/kohya-ss/sd-scripts/issues/44#issuecomment-1375690372

I've also run into this paging file issue. After doing some profiling, I've found at the start of each epoch, the program tries to spawn 7 extra worker threads which each take up 4GB of ram in my case, and then kills them all off after the epoch ends.

Setting the page file sufficiently large is a fix, but I'm not sure that:

  • Each thread needs the 4GB it's consuming (I'm not even sure that it uses it for anything either)
  • These threads need to be killed off after each epoch (spawning them takes a significant amount of time)
  • Whether there needs to be that many threads in the first place, given that they consume so many resources.

I haven't looked at the code base yet, nor am I familiar with programming ML in python, so I'm not sure I can find the issue and submit a PR, but it does seem like a pretty big issue that should be addressed.

TheDevelo avatar Jan 13 '23 20:01 TheDevelo

@TheDevelo Are you spawning 8 threads with the accelerate command?

Yeah, this could probably be dialed back, as I don't think training will be very CPU bound.

brucethemoose avatar Jan 13 '23 20:01 brucethemoose

Normally yes, but I've also tried lowering the number of threads and it still spawns 8, so it seems to be independent of accelerate.

TheDevelo avatar Jan 13 '23 20:01 TheDevelo

Hmm...New DLL didn't seem to change anything. Still getting same error.

image

UBBK avatar Jan 13 '23 21:01 UBBK

@TheDevelo If running LORA, try changing the line here to some static number (2?): https://github.com/kohya-ss/sd-scripts/blob/bf691aef69d883e4d9e61104609b479ba3be9aad/train_network.py#L129

@UBBK Try running python -m bitsandbytes in the console. It should spit out some more useful debug info.

brucethemoose avatar Jan 13 '23 21:01 brucethemoose

Oh nevermind. Doing this AND increasing paging size at the same time solved the issue.

UBBK avatar Jan 13 '23 21:01 UBBK

@TheDevelo If running LORA, try changing the line here to some static number (2?):

https://github.com/kohya-ss/sd-scripts/blob/bf691aef69d883e4d9e61104609b479ba3be9aad/train_network.py#L129

Yep, seems that's the issue. Changed it to 1 and now it only spawns 1 worker thread, which reduced RAM usage significantly. The time between epochs is reduced too, since spawning each thread took a good amount of time. It even takes less time per step too.

TheDevelo avatar Jan 13 '23 22:01 TheDevelo

I've added --max_data_loader_n_workers option to specify the number of workers. Please try to use the option.

kohya-ss avatar Jan 15 '23 04:01 kohya-ss