OneTrainer
OneTrainer copied to clipboard
[Bug]: train device cuda:1
What happened?
train device cuda:1 does not work. train device cuda:0 works. train device cuda works.
Simple solution (crutch):
In start-ui.bat add set CUDA_VISIBLE_DEVICES=1
What did you expect would happen?
there are video cards: RTX 4060 ti 16gb - cuda:0 or CUDA_VISIBLE_DEVICES=0 tesla p40 24gb - cuda:1 or CUDA_VISIBLE_DEVICES=1 <- I need to select this one tesla p4 8gb - cuda:2 or CUDA_VISIBLE_DEVICES=2
Relevant log output
Exception in thread Thread-1 (__training_thread_function):
Traceback (most recent call last):
File "C:\Users\IBers\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\IBers\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "D:\OneTrainer-master\modules\ui\TrainUI.py", line 475, in __training_thread_function
ZLUDA.initialize_devices(self.train_config)
File "D:\OneTrainer-master\modules\zluda\ZLUDA.py", line 34, in initialize_devices
if not is_zluda(config.train_device) and not is_zluda(config.temp_device):
File "D:\OneTrainer-master\modules\zluda\ZLUDA.py", line 12, in is_zluda
return torch.cuda.get_device_name(device).endswith("[ZLUDA]")
File "D:\OneTrainer-master\venv\lib\site-packages\torch\cuda\__init__.py", line 423, in get_device_name
return get_device_properties(device).name
File "D:\OneTrainer-master\venv\lib\site-packages\torch\cuda\__init__.py", line 456, in get_device_properties
raise AssertionError("Invalid device id")
AssertionError: Invalid device id
Output of pip freeze
No response