PHC
PHC copied to clipboard
better maually cleanup gpu memory when loading motions
often meets CUDA out of memory in the stage of evaluating the model (which periodically called after 1500 iterations of training).
In motion_lib_real.py line 199 we load the motions in memory and then transfer them into gpu tensors. Then given to class variables (e.g. self.gts). Perhaps tensors loaded previously in self.gts are not cleaned automatically.
self.gts = torch.cat([m.global_translation for m in motions], dim=0).float().to(self._device)
self.grs = torch.cat([m.global_rotation for m in motions], dim=0).float().to(self._device)
self.lrs = torch.cat([m.local_rotation for m in motions], dim=0).float().to(self._device)
self.grvs = torch.cat([m.global_root_velocity for m in motions], dim=0).float().to(self._device)
self.gravs = torch.cat([m.global_root_angular_velocity for m in motions], dim=0).float().to(self._device)
self.gavs = torch.cat([m.global_angular_velocity for m in motions], dim=0).float().to(self._device)
self.gvs = torch.cat([m.global_velocity for m in motions], dim=0).float().to(self._device)
self.dvs = torch.cat([m.dof_vels for m in motions], dim=0).float().to(self._device)
So better manually clean the cache before we load.
self.gts, self.grs, self.lrs, self.grvs, self.gravs, self.gavs, self.gvs, self.dvs = None, None, None, None, None, None, None, None
gc.collect(); torch.cuda.empty_cache()
the same in line 208
self.gts_t, self.grs_t, self.gvs_t, self.gavs_t = None, None, None, None
gc.collect(); torch.cuda.empty_cache()
and line 214
self.dof_pos = None
gc.collect(); torch.cuda.empty_cache()
Helps me train on single RTX 4090. But im not sure if this is the case. It is wierd that the memory is not cleaned up automatically after assigning new data on the old variables.
Found a strange thing. There is already cleanup codes in motion_lib_real.py line 77, but it is commented out.
# if "gts" in self.__dict__:
# del self.gts, self.grs, self.lrs, self.grvs, self.gravs, self.gavs, self.gvs, self.dvs
# del self._motion_lengths, self._motion_fps, self._motion_dt, self._motion_num_frames, self._motion_bodies, self._motion_aa
# if "gts_t" in self.__dict__:
# self.gts_t, self.grs_t, self.gvs_t
# if flags.real_traj:
# del self.q_gts, self.q_grs, self.q_gavs, self.q_gvs
change to this
if "gts" in self.__dict__:
del self.gts, self.grs, self.lrs, self.grvs, self.gravs, self.gavs, self.gvs, self.dvs
del self._motion_lengths, self._motion_fps, self._motion_dt, self._motion_num_frames, self._motion_bodies, self._motion_aa
if "gts_t" in self.__dict__:
del self.gts_t, self.grs_t, self.gvs_t, self.gavs_t
if "dof_pos" in self.__dict__:
del self.dof_pos
if flags.real_traj:
del self.q_gts, self.q_grs, self.q_gavs, self.q_gvs
can further cut down gpu memory usage by cleaning variables after the last evaluation. The variables in env._motion_env_lib is not cleared when switching _motion_lib to motion_train_lib after finished evaluation, and we dont need it when training. Also we still need to load it the next time after 1500 epoches of train.
in phc/learning/im_amp.py line 227
humanoid_env._motion_eval_lib.clear_cache() # add this
humanoid_env._motion_lib = humanoid_env._motion_train_lib
in phc/utils/motion_lib_real.py add this function
def clear_cache(self):
if "gts" in self.__dict__:
del self.gts, self.grs, self.lrs, self.grvs, self.gravs, self.gavs, self.gvs, self.dvs
del self._motion_lengths, self._motion_fps, self._motion_dt, self._motion_num_frames, self._motion_bodies, self._motion_aa
if "gts_t" in self.__dict__:
del self.gts_t, self.grs_t, self.gvs_t, self.gavs_t
if "dof_pos" in self.__dict__:
del self.dof_pos
if flags.real_traj:
del self.q_gts, self.q_grs, self.q_gavs, self.q_gvs
It is also possible to clear _motion_train_lib when entering evaluation in line 178 in im_amp.py, but it seems ok now.
A typical gpu memory usage time line when using num_envs=2048 :
5 GB is allocated by gym simulation
12.5 GB to load training variables
5 GB (at peak) to loading evaluation variables.
After evaluation, it comes back to 5+12.5 GB.
Thanks for pointing this out! Indeed I have this cleaning up code in older versions but removed it since I have faced issues with it when using mujoco for visualization. Essentially, the motion states will be deleted when I try to interact with the UI (to request the next frame motion).
Feel free to create a pull request for this. Thanks! @luoye2333
"I also have a single RTX 4090, and after training for 1500 episodes, it gets randomly killed during evaluation. Although there is no 'out of memory' message in the terminal, I think we might be facing the same issue. Thank you very much for your suggestion."
can further cut down gpu memory usage by cleaning variables after the last evaluation. The variables in env._motion_env_lib is not cleared when switching _motion_lib to motion_train_lib after finished evaluation, and we dont need it when training. Also we still need to load it the next time after 1500 epoches of train.
in phc/learning/im_amp.py line 227
humanoid_env._motion_eval_lib.clear_cache() # add this humanoid_env._motion_lib = humanoid_env._motion_train_libin phc/utils/motion_lib_real.py add this function
def clear_cache(self): if "gts" in self.__dict__: del self.gts, self.grs, self.lrs, self.grvs, self.gravs, self.gavs, self.gvs, self.dvs del self._motion_lengths, self._motion_fps, self._motion_dt, self._motion_num_frames, self._motion_bodies, self._motion_aa if "gts_t" in self.__dict__: del self.gts_t, self.grs_t, self.gvs_t, self.gavs_t if "dof_pos" in self.__dict__: del self.dof_pos if flags.real_traj: del self.q_gts, self.q_grs, self.q_gavs, self.q_gvsIt is also possible to clear _motion_train_lib when entering evaluation in line 178 in im_amp.py, but it seems ok now.
A typical gpu memory usage time line when using num_envs=2048 : 5 GB is allocated by gym simulation 12.5 GB to load training variables 5 GB (at peak) to loading evaluation variables. After evaluation, it comes back to 5+12.5 GB.
Hello, how do you monitor real-time memory usage? Do you have any related code? I encounter crashes every time I evaluate at 7500 steps, but when I check with nvidia-smi, there is still a lot of GPU memory available.