yoyodyne icon indicating copy to clipboard operation
yoyodyne copied to clipboard

TQDM Error with multi GPU Transducer

Open bonham79 opened this issue 8 months ago • 1 comments

Issue when running multi-gpu training with edit action transducer:

Traceback (most recent call last):                                                                                                   
  File "/home/salamander/anaconda3/envs/sigmorphon2024/bin/yoyodyne-train", line 8, in <module>
    sys.exit(main())
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/train.py", line 390, in main
    model = get_model_from_argparse_args(args, datamodule)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/train.py", line 214, in get_model_from_argparse_args
    return model_cls(
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/transducer.py", line 43, in __init__
    super().__init__(*args, **kwargs)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/lstm.py", line 36, in __init__
    super().__init__(*args, **kwargs)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/base.py", line 155, in __init__
    self.save_hyperparameters(
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/pytorch_lightning/core/mixins/hparams_mixin.py", line 110, in save_hyperparameters
    save_hyperparameters(self, *args, ignore=ignore, frame=frame)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/pytorch_lightning/utilities/parsing.py", line 275, in save_hyperparameters
    obj._hparams_initial = copy.deepcopy(obj._hparams)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 297, in _reconstruct
    value = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle '_io.TextIOWrapper' object
Exception ignored in: <function tqdm.__del__ at 0x7f96d86a6290>
Traceback (most recent call last):
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/tqdm/std.py", line 1148, in __del__
    self.close()
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/tqdm/std.py", line 1267, in close
    if self.disable:
AttributeError: 'tqdm' object has no attribute 'disable'

From what I gather, the TQDM class within the expert module can't be pickled to distribute across multiple GPUs. This is fixed by adding expert to the ignore function when saving hyperparameters, but wanted to get feedback if there was a less 'hacky' way to deal with it.

@kylebgorman thoughts?

bonham79 avatar Jun 08 '24 22:06 bonham79