PHC better maually cleanup gpu memory when loading motions

often meets CUDA out of memory in the stage of evaluating the model (which periodically called after 1500 iterations of training).

In motion_lib_real.py line 199 we load the motions in memory and then transfer them into gpu tensors. Then given to class variables (e.g. self.gts). Perhaps tensors loaded previously in self.gts are not cleaned automatically.

self.gts = torch.cat([m.global_translation for m in motions], dim=0).float().to(self._device)
self.grs = torch.cat([m.global_rotation for m in motions], dim=0).float().to(self._device)
self.lrs = torch.cat([m.local_rotation for m in motions], dim=0).float().to(self._device)
self.grvs = torch.cat([m.global_root_velocity for m in motions], dim=0).float().to(self._device)
self.gravs = torch.cat([m.global_root_angular_velocity for m in motions], dim=0).float().to(self._device)
self.gavs = torch.cat([m.global_angular_velocity for m in motions], dim=0).float().to(self._device)
self.gvs = torch.cat([m.global_velocity for m in motions], dim=0).float().to(self._device)
self.dvs = torch.cat([m.dof_vels for m in motions], dim=0).float().to(self._device)

So better manually clean the cache before we load.

self.gts, self.grs, self.lrs, self.grvs, self.gravs, self.gavs, self.gvs, self.dvs = None, None, None, None, None, None, None, None
gc.collect(); torch.cuda.empty_cache()

the same in line 208

self.gts_t, self.grs_t, self.gvs_t, self.gavs_t = None, None, None, None
gc.collect(); torch.cuda.empty_cache()

and line 214

self.dof_pos = None
gc.collect(); torch.cuda.empty_cache()

Helps me train on single RTX 4090. But im not sure if this is the case. It is wierd that the memory is not cleaned up automatically after assigning new data on the old variables.

Nov 13 '24 06:11 luoye2333

Found a strange thing. There is already cleanup codes in motion_lib_real.py line 77, but it is commented out.

# if "gts" in self.__dict__:
#     del self.gts, self.grs, self.lrs, self.grvs, self.gravs, self.gavs, self.gvs, self.dvs
#     del self._motion_lengths, self._motion_fps, self._motion_dt, self._motion_num_frames, self._motion_bodies, self._motion_aa
#     if "gts_t" in self.__dict__:
#         self.gts_t, self.grs_t, self.gvs_t
#     if flags.real_traj:
#         del self.q_gts, self.q_grs, self.q_gavs, self.q_gvs

change to this

if "gts" in self.__dict__:
    del self.gts, self.grs, self.lrs, self.grvs, self.gravs, self.gavs, self.gvs, self.dvs
    del self._motion_lengths, self._motion_fps, self._motion_dt, self._motion_num_frames, self._motion_bodies, self._motion_aa
if "gts_t" in self.__dict__:
    del self.gts_t, self.grs_t, self.gvs_t, self.gavs_t
if "dof_pos" in self.__dict__:
    del self.dof_pos
if flags.real_traj:
    del self.q_gts, self.q_grs, self.q_gavs, self.q_gvs

Nov 13 '24 08:11 luoye2333

can further cut down gpu memory usage by cleaning variables after the last evaluation. The variables in env._motion_env_lib is not cleared when switching _motion_lib to motion_train_lib after finished evaluation, and we dont need it when training. Also we still need to load it the next time after 1500 epoches of train.

in phc/learning/im_amp.py line 227

humanoid_env._motion_eval_lib.clear_cache() # add this
humanoid_env._motion_lib = humanoid_env._motion_train_lib

in phc/utils/motion_lib_real.py add this function

def clear_cache(self):
    if "gts" in self.__dict__:
        del self.gts, self.grs, self.lrs, self.grvs, self.gravs, self.gavs, self.gvs, self.dvs
        del self._motion_lengths, self._motion_fps, self._motion_dt, self._motion_num_frames, self._motion_bodies, self._motion_aa
    if "gts_t" in self.__dict__:
        del self.gts_t, self.grs_t, self.gvs_t, self.gavs_t
    if "dof_pos" in self.__dict__:
        del self.dof_pos
    if flags.real_traj:
        del self.q_gts, self.q_grs, self.q_gavs, self.q_gvs

It is also possible to clear _motion_train_lib when entering evaluation in line 178 in im_amp.py, but it seems ok now.

A typical gpu memory usage time line when using num_envs=2048 : 5 GB is allocated by gym simulation 12.5 GB to load training variables 5 GB (at peak) to loading evaluation variables. After evaluation, it comes back to 5+12.5 GB. gpu0_memory_usage

Nov 18 '24 05:11 luoye2333

Thanks for pointing this out! Indeed I have this cleaning up code in older versions but removed it since I have faced issues with it when using mujoco for visualization. Essentially, the motion states will be deleted when I try to interact with the UI (to request the next frame motion).

Feel free to create a pull request for this. Thanks! @luoye2333

Dec 10 '24 01:12 ZhengyiLuo

"I also have a single RTX 4090, and after training for 1500 episodes, it gets randomly killed during evaluation. Although there is no 'out of memory' message in the terminal, I think we might be facing the same issue. Thank you very much for your suggestion."

Aug 10 '25 09:08 onlyloveyanzi

can further cut down gpu memory usage by cleaning variables after the last evaluation. The variables in env._motion_env_lib is not cleared when switching _motion_lib to motion_train_lib after finished evaluation, and we dont need it when training. Also we still need to load it the next time after 1500 epoches of train.

in phc/learning/im_amp.py line 227
humanoid_env._motion_eval_lib.clear_cache() # add this
humanoid_env._motion_lib = humanoid_env._motion_train_lib
in phc/utils/motion_lib_real.py add this function
def clear_cache(self):
    if "gts" in self.__dict__:
        del self.gts, self.grs, self.lrs, self.grvs, self.gravs, self.gavs, self.gvs, self.dvs
        del self._motion_lengths, self._motion_fps, self._motion_dt, self._motion_num_frames, self._motion_bodies, self._motion_aa
    if "gts_t" in self.__dict__:
        del self.gts_t, self.grs_t, self.gvs_t, self.gavs_t
    if "dof_pos" in self.__dict__:
        del self.dof_pos
    if flags.real_traj:
        del self.q_gts, self.q_grs, self.q_gavs, self.q_gvs
It is also possible to clear _motion_train_lib when entering evaluation in line 178 in im_amp.py, but it seems ok now.

A typical gpu memory usage time line when using num_envs=2048 : 5 GB is allocated by gym simulation 12.5 GB to load training variables 5 GB (at peak) to loading evaluation variables. After evaluation, it comes back to 5+12.5 GB.

Hello, how do you monitor real-time memory usage? Do you have any related code? I encounter crashes every time I evaluate at 7500 steps, but when I check with nvidia-smi, there is still a lot of GPU memory available.

Aug 11 '25 06:08 onlyloveyanzi

PHC PHC copied to clipboard

better maually cleanup gpu memory when loading motions

PHC
PHC copied to clipboard