LearnTrajDep icon indicating copy to clipboard operation
LearnTrajDep copied to clipboard

some memory issues

Open Saltyptisan opened this issue 1 year ago • 7 comments

After multiple calls to loss_func.mpjpe_error and loss_func.euler_error, GPU memory is fully occupied. This may be due to some variables not being released properly.

Saltyptisan avatar Nov 17 '23 08:11 Saltyptisan

Hello, may I ask how do you solve this problem? I found that when executing quickdemo, the GPU memory occupied by each epoch will be superimposed and not released. What should I do?thank you !

Logan-007L avatar Dec 07 '23 11:12 Logan-007L

After multiple calls to loss_func.mpjpe_error and loss_func.euler_error, GPU memory is fully occupied. This may be due to some variables not being released properly.

Hello, may I ask how do you solve this problem? I found that when executing quickdemo, the GPU memory occupied by each epoch will be superimposed and not released. What should I do?thank you !

Logan-007L avatar Dec 07 '23 11:12 Logan-007L

After multiple calls to loss_func.mpjpe_error and loss_func.euler_error, GPU memory is fully occupied. This may be due to some variables not being released properly.

Hello, may I ask how do you solve this problem? I found that when executing quickdemo, the GPU memory occupied by each epoch will be superimposed and not released. What should I do?thank you !

data_utils.py

def expmap2rotmat_torch(r):
    """
    Converts expmap matrix to rotation
    batch pytorch version ported from the corresponding method above
    :param r: N*3
    :return: N*3*3
    """
    theta = torch.norm(r, 2, 1)
    r0 = torch.div(r, theta.unsqueeze(1).repeat(1, 3) + 0.0000001)
    r1 = torch.zeros_like(r0).repeat(1, 3)
    # print("In e2r 566, gpu allocated:", torch.cuda.memory_allocated())
    r1[:, 1] = -r0[:, 2]
    r1[:, 2] = r0[:, 1]
    r1[:, 5] = -r0[:, 0]
    r1 = r1.view(-1, 3, 3)
    r1 = r1 - r1.transpose(1, 2)
    n = r1.data.shape[0]
    
    R = Variable(torch.eye(3, 3).repeat(n, 1, 1)).float().cuda().detach()
    R_2 = torch.mul(torch.sin(theta).unsqueeze(1).repeat(1, 9).view(-1, 3, 3), r1).detach()
    R_3 = torch.mul((1 - torch.cos(theta).unsqueeze(1).repeat(1, 9).view(-1, 3, 3)), torch.matmul(r1, r1)).detach()
    R = R + R_2 + R_3

    return R

改了R的式子, 我猜测是计算R的每个过程都被保留了梯度矩阵,但实际上又用不到~

Saltyptisan avatar Dec 07 '23 12:12 Saltyptisan

After multiple calls to loss_func.mpjpe_error and loss_func.euler_error, GPU memory is fully occupied. This may be due to some variables not being released properly.

Hello, may I ask how do you solve this problem? I found that when executing quickdemo, the GPU memory occupied by each epoch will be superimposed and not released. What should I do?thank you !

data_utils.py

def expmap2rotmat_torch(r):
    """
    Converts expmap matrix to rotation
    batch pytorch version ported from the corresponding method above
    :param r: N*3
    :return: N*3*3
    """
    theta = torch.norm(r, 2, 1)
    r0 = torch.div(r, theta.unsqueeze(1).repeat(1, 3) + 0.0000001)
    r1 = torch.zeros_like(r0).repeat(1, 3)
    # print("In e2r 566, gpu allocated:", torch.cuda.memory_allocated())
    r1[:, 1] = -r0[:, 2]
    r1[:, 2] = r0[:, 1]
    r1[:, 5] = -r0[:, 0]
    r1 = r1.view(-1, 3, 3)
    r1 = r1 - r1.transpose(1, 2)
    n = r1.data.shape[0]
    
    R = Variable(torch.eye(3, 3).repeat(n, 1, 1)).float().cuda().detach()
    R_2 = torch.mul(torch.sin(theta).unsqueeze(1).repeat(1, 9).view(-1, 3, 3), r1).detach()
    R_3 = torch.mul((1 - torch.cos(theta).unsqueeze(1).repeat(1, 9).view(-1, 3, 3)), torch.matmul(r1, r1)).detach()
    R = R + R_2 + R_3

    return R

改了R的式子, 我猜测是计算R的每个过程都被保留了梯度矩阵,但实际上又用不到~

Thanks a lot! Thank you very much! I'm a beginner. What a pleasure to have such help!

Logan-007L avatar Dec 07 '23 12:12 Logan-007L

After multiple calls to loss_func.mpjpe_error and loss_func.euler_error, GPU memory is fully occupied. This may be due to some variables not being released properly.

Hello, may I ask how do you solve this problem? I found that when executing quickdemo, the GPU memory occupied by each epoch will be superimposed and not released. What should I do?thank you !

data_utils.py

def expmap2rotmat_torch(r):
    """
    Converts expmap matrix to rotation
    batch pytorch version ported from the corresponding method above
    :param r: N*3
    :return: N*3*3
    """
    theta = torch.norm(r, 2, 1)
    r0 = torch.div(r, theta.unsqueeze(1).repeat(1, 3) + 0.0000001)
    r1 = torch.zeros_like(r0).repeat(1, 3)
    # print("In e2r 566, gpu allocated:", torch.cuda.memory_allocated())
    r1[:, 1] = -r0[:, 2]
    r1[:, 2] = r0[:, 1]
    r1[:, 5] = -r0[:, 0]
    r1 = r1.view(-1, 3, 3)
    r1 = r1 - r1.transpose(1, 2)
    n = r1.data.shape[0]
    
    R = Variable(torch.eye(3, 3).repeat(n, 1, 1)).float().cuda().detach()
    R_2 = torch.mul(torch.sin(theta).unsqueeze(1).repeat(1, 9).view(-1, 3, 3), r1).detach()
    R_3 = torch.mul((1 - torch.cos(theta).unsqueeze(1).repeat(1, 9).view(-1, 3, 3)), torch.matmul(r1, r1)).detach()
    R = R + R_2 + R_3

    return R

改了R的式子, 我猜测是计算R的每个过程都被保留了梯度矩阵,但实际上又用不到~

Thanks a lot! Thank you very much! I'm a beginner. What a pleasure to have such help!

You're welcome. Happy to help~

Saltyptisan avatar Dec 08 '23 06:12 Saltyptisan

I am sorry to trouble you again. I successfully ran the training code and found that it takes about 20 hours to train 50 epochs on the h3.6m dataset. Is that the same for you? The training was taking so long that I was wondering if my server was having a problem

Logan-007L avatar Jan 13 '24 15:01 Logan-007L