vq_bet_official icon indicating copy to clipboard operation
vq_bet_official copied to clipboard

Dataloaders and training speed

Open Abc11c opened this issue 6 months ago • 5 comments

Hi @jayLEE0301,

Thank you for releasing the code! I wanted to quickly benchmark the non-goal conditioned ant env, but the training feels slow, am I missing something ?

Is it because we are not using num_workers in the dataloaders?

Am I missing something ? I'm trying to run this on NVIDIA RTX A6000 with torch version 1.12.1+cu116

When I try to pin the memory and change the num_workers in the dataloader I get the following error

Traceback (most recent call last):
  File "/home/USER/working/vq_bet_official/examples/train.py", line 162, in main
    for data in test_loader:
  File "/home/USER/.conda/envs/vq-bet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/USER/.conda/envs/vq-bet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/home/USER/.conda/envs/vq-bet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/home/USER/.conda/envs/vq-bet/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/USER/.conda/envs/vq-bet/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/USER/.conda/envs/vq-bet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/USER/.conda/envs/vq-bet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/USER/working/vq_bet_official/examples/dataset.py", line 405, in __getitem__
    if self.dataset[i][0].ndim > 2:
  File "/home/USER/.conda/envs/vq-bet/lib/python3.9/site-packages/torch/utils/data/dataset.py", line 290, in __getitem__
    return self.dataset[self.indices[idx]]
  File "/home/USER/working/vq_bet_official/examples/dataset.py", line 264, in __getitem__
    T = self.masks[idx].sum().int().item()
RuntimeError: CUDA error: initialization error

Any help regarding the same

Thanks!

Abc11c avatar Aug 02 '24 15:08 Abc11c