LearnTrajDep some memory issues

After multiple calls to loss_func.mpjpe_error and loss_func.euler_error, GPU memory is fully occupied. This may be due to some variables not being released properly.

Nov 17 '23 08:11 Saltyptisan

Hello, may I ask how do you solve this problem? I found that when executing quickdemo, the GPU memory occupied by each epoch will be superimposed and not released. What should I do？thank you !

Dec 07 '23 11:12 Logan-007L

After multiple calls to loss_func.mpjpe_error and loss_func.euler_error, GPU memory is fully occupied. This may be due to some variables not being released properly.

Hello, may I ask how do you solve this problem? I found that when executing quickdemo, the GPU memory occupied by each epoch will be superimposed and not released. What should I do？thank you !

Dec 07 '23 11:12 Logan-007L

After multiple calls to loss_func.mpjpe_error and loss_func.euler_error, GPU memory is fully occupied. This may be due to some variables not being released properly.

Hello, may I ask how do you solve this problem? I found that when executing quickdemo, the GPU memory occupied by each epoch will be superimposed and not released. What should I do？thank you !

data_utils.py

def expmap2rotmat_torch(r):
    """
    Converts expmap matrix to rotation
    batch pytorch version ported from the corresponding method above
    :param r: N*3
    :return: N*3*3
    """
    theta = torch.norm(r, 2, 1)
    r0 = torch.div(r, theta.unsqueeze(1).repeat(1, 3) + 0.0000001)
    r1 = torch.zeros_like(r0).repeat(1, 3)
    # print("In e2r 566, gpu allocated:", torch.cuda.memory_allocated())
    r1[:, 1] = -r0[:, 2]
    r1[:, 2] = r0[:, 1]
    r1[:, 5] = -r0[:, 0]
    r1 = r1.view(-1, 3, 3)
    r1 = r1 - r1.transpose(1, 2)
    n = r1.data.shape[0]
    
    R = Variable(torch.eye(3, 3).repeat(n, 1, 1)).float().cuda().detach()
    R_2 = torch.mul(torch.sin(theta).unsqueeze(1).repeat(1, 9).view(-1, 3, 3), r1).detach()
    R_3 = torch.mul((1 - torch.cos(theta).unsqueeze(1).repeat(1, 9).view(-1, 3, 3)), torch.matmul(r1, r1)).detach()
    R = R + R_2 + R_3

    return R

改了R的式子，我猜测是计算R的每个过程都被保留了梯度矩阵，但实际上又用不到~

Dec 07 '23 12:12 Saltyptisan

After multiple calls to loss_func.mpjpe_error and loss_func.euler_error, GPU memory is fully occupied. This may be due to some variables not being released properly.

Hello, may I ask how do you solve this problem? I found that when executing quickdemo, the GPU memory occupied by each epoch will be superimposed and not released. What should I do？thank you !

data_utils.py
def expmap2rotmat_torch(r):
    """
    Converts expmap matrix to rotation
    batch pytorch version ported from the corresponding method above
    :param r: N*3
    :return: N*3*3
    """
    theta = torch.norm(r, 2, 1)
    r0 = torch.div(r, theta.unsqueeze(1).repeat(1, 3) + 0.0000001)
    r1 = torch.zeros_like(r0).repeat(1, 3)
    # print("In e2r 566, gpu allocated:", torch.cuda.memory_allocated())
    r1[:, 1] = -r0[:, 2]
    r1[:, 2] = r0[:, 1]
    r1[:, 5] = -r0[:, 0]
    r1 = r1.view(-1, 3, 3)
    r1 = r1 - r1.transpose(1, 2)
    n = r1.data.shape[0]
    
    R = Variable(torch.eye(3, 3).repeat(n, 1, 1)).float().cuda().detach()
    R_2 = torch.mul(torch.sin(theta).unsqueeze(1).repeat(1, 9).view(-1, 3, 3), r1).detach()
    R_3 = torch.mul((1 - torch.cos(theta).unsqueeze(1).repeat(1, 9).view(-1, 3, 3)), torch.matmul(r1, r1)).detach()
    R = R + R_2 + R_3

    return R
改了R的式子，我猜测是计算R的每个过程都被保留了梯度矩阵，但实际上又用不到~

Thanks a lot! Thank you very much! I'm a beginner. What a pleasure to have such help!

Dec 07 '23 12:12 Logan-007L

After multiple calls to loss_func.mpjpe_error and loss_func.euler_error, GPU memory is fully occupied. This may be due to some variables not being released properly.

Hello, may I ask how do you solve this problem? I found that when executing quickdemo, the GPU memory occupied by each epoch will be superimposed and not released. What should I do？thank you !

data_utils.py
def expmap2rotmat_torch(r):
    """
    Converts expmap matrix to rotation
    batch pytorch version ported from the corresponding method above
    :param r: N*3
    :return: N*3*3
    """
    theta = torch.norm(r, 2, 1)
    r0 = torch.div(r, theta.unsqueeze(1).repeat(1, 3) + 0.0000001)
    r1 = torch.zeros_like(r0).repeat(1, 3)
    # print("In e2r 566, gpu allocated:", torch.cuda.memory_allocated())
    r1[:, 1] = -r0[:, 2]
    r1[:, 2] = r0[:, 1]
    r1[:, 5] = -r0[:, 0]
    r1 = r1.view(-1, 3, 3)
    r1 = r1 - r1.transpose(1, 2)
    n = r1.data.shape[0]
    
    R = Variable(torch.eye(3, 3).repeat(n, 1, 1)).float().cuda().detach()
    R_2 = torch.mul(torch.sin(theta).unsqueeze(1).repeat(1, 9).view(-1, 3, 3), r1).detach()
    R_3 = torch.mul((1 - torch.cos(theta).unsqueeze(1).repeat(1, 9).view(-1, 3, 3)), torch.matmul(r1, r1)).detach()
    R = R + R_2 + R_3

    return R
改了R的式子，我猜测是计算R的每个过程都被保留了梯度矩阵，但实际上又用不到~
Thanks a lot! Thank you very much! I'm a beginner. What a pleasure to have such help!

You're welcome. Happy to help~

Dec 08 '23 06:12 Saltyptisan

I am sorry to trouble you again. I successfully ran the training code and found that it takes about 20 hours to train 50 epochs on the h3.6m dataset. Is that the same for you? The training was taking so long that I was wondering if my server was having a problem

Jan 13 '24 15:01 Logan-007L

LearnTrajDep LearnTrajDep copied to clipboard

some memory issues

LearnTrajDep
LearnTrajDep copied to clipboard