OneTrainer icon indicating copy to clipboard operation
OneTrainer copied to clipboard

[Bug]: train device cuda:1

Open ToJl9TopTonop opened this issue 10 months ago • 4 comments

What happened?

image

train device cuda:1 does not work. train device cuda:0 works. train device cuda works.

Simple solution (crutch): In start-ui.bat add set CUDA_VISIBLE_DEVICES=1 image

What did you expect would happen?

there are video cards: RTX 4060 ti 16gb - cuda:0 or CUDA_VISIBLE_DEVICES=0 tesla p40 24gb - cuda:1 or CUDA_VISIBLE_DEVICES=1 <- I need to select this one tesla p4 8gb - cuda:2 or CUDA_VISIBLE_DEVICES=2

Relevant log output

Exception in thread Thread-1 (__training_thread_function):
Traceback (most recent call last):
  File "C:\Users\IBers\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\IBers\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "D:\OneTrainer-master\modules\ui\TrainUI.py", line 475, in __training_thread_function
    ZLUDA.initialize_devices(self.train_config)
  File "D:\OneTrainer-master\modules\zluda\ZLUDA.py", line 34, in initialize_devices
    if not is_zluda(config.train_device) and not is_zluda(config.temp_device):
  File "D:\OneTrainer-master\modules\zluda\ZLUDA.py", line 12, in is_zluda
    return torch.cuda.get_device_name(device).endswith("[ZLUDA]")
  File "D:\OneTrainer-master\venv\lib\site-packages\torch\cuda\__init__.py", line 423, in get_device_name
    return get_device_properties(device).name
  File "D:\OneTrainer-master\venv\lib\site-packages\torch\cuda\__init__.py", line 456, in get_device_properties
    raise AssertionError("Invalid device id")
AssertionError: Invalid device id

Output of pip freeze

No response

ToJl9TopTonop avatar Apr 17 '24 12:04 ToJl9TopTonop