yoyodyne
yoyodyne copied to clipboard
TQDM Error with multi GPU Transducer
Issue when running multi-gpu training with edit action transducer:
Traceback (most recent call last):
File "/home/salamander/anaconda3/envs/sigmorphon2024/bin/yoyodyne-train", line 8, in <module>
sys.exit(main())
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/train.py", line 390, in main
model = get_model_from_argparse_args(args, datamodule)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/train.py", line 214, in get_model_from_argparse_args
return model_cls(
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/transducer.py", line 43, in __init__
super().__init__(*args, **kwargs)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/lstm.py", line 36, in __init__
super().__init__(*args, **kwargs)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/base.py", line 155, in __init__
self.save_hyperparameters(
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/pytorch_lightning/core/mixins/hparams_mixin.py", line 110, in save_hyperparameters
save_hyperparameters(self, *args, ignore=ignore, frame=frame)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/pytorch_lightning/utilities/parsing.py", line 275, in save_hyperparameters
obj._hparams_initial = copy.deepcopy(obj._hparams)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 161, in deepcopy
rv = reductor(4)
TypeError: cannot pickle '_io.TextIOWrapper' object
Exception ignored in: <function tqdm.__del__ at 0x7f96d86a6290>
Traceback (most recent call last):
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/tqdm/std.py", line 1148, in __del__
self.close()
File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/tqdm/std.py", line 1267, in close
if self.disable:
AttributeError: 'tqdm' object has no attribute 'disable'
From what I gather, the TQDM class within the expert
module can't be pickled to distribute across multiple GPUs. This is fixed by adding expert
to the ignore
function when saving hyperparameters, but wanted to get feedback if there was a less 'hacky' way to deal with it.
@kylebgorman thoughts?