robomimic
robomimic copied to clipboard
Robocasa Language Embedding Cuda Out of Memory Error
In line 222 of the Robocasa branch of robomimic/utils/train_utils.py, upon dataset creation, the dataset kwargs are deepcoppied. Since the language embedding model is one of the dataset_kwargs, this makes a copy of the model as well. This has caused me to run into a cuda out-of-memory issue when you train on a large number of dataset files. For example in Libero if you have 90 datasets, there are 90 copies of the language embedding model in cuda memory. I made a quick modification that fixed this problem:
for i in range(len(ds_weights)):
ds_kwargs_copy = deepcopy(ds_kwargs)
# Change so that we do not run out of cuda memory
if "lang_encoder" in ds_kwargs:
ds_kwargs_copy["lang_encoder"] = ds_kwargs["lang_encoder"]
keys = ["hdf5_path", "filter_by_attribute"]
for k in keys:
ds_kwargs_copy[k] = ds_kwargs[k][i]
ds_kwargs_copy["dataset_lang"] = ds_langs[i]
ds_list.append(ds_class(**ds_kwargs_copy))
Should I maybe make this a PR? It might be more efficient to pop the lang_encoder and then not copy it for every dataset (even though with the above fix it gets immediately deleted)