machine-learning-practice-code icon indicating copy to clipboard operation
machine-learning-practice-code copied to clipboard

关于MAT模型训练的问题

Open jwc19890114 opened this issue 5 years ago • 0 comments

抱歉又打扰您,我在用您的DEMO做训练的时候发现数据里面Y/mask/weight都被加载到了GPU中,但是模型和X没有,而且在训练的时候是CPU占用率最高。 正常情况下这种加载方式不是会报错的么,为什么在这里能够正确训练呢?我尝试把X和model都放入GPU中,好像运算时间也没有缩短。求问这是怎么回事。

for epoch in range(num_epoch):
    start = time.time()
    num, total_loss = 0, 0
#     if epoch == 5000:
#         optimizer.param_groups[0]['lr'] = lr * 0.1
    data = tav_data_iterator(
        corpus_indice, topics_indice, batch_size, max(length) + 1)
#     hidden = model.module.init_hidden(num_layers, batch_size, hidden_dim)
    weight = torch.ones(len(vocab))
    weight[0] = 0
    for X, Y, mask, topics in tqdm(data):
        num += 1
#         hidden.detach_()
        if use_gpu:
            #             X = X.to(device)
            Y = Y.to(device)
            mask = mask.to(device)
#             topics = topics.to(device)
#             hidden = hidden.to(device)
            weight = weight.to(device)
        optimizer.zero_grad()
#         output, hidden = model(X, topics, hidden)
        output, hidden = model(X, topics)
        hidden.detach_()
#         l = F.cross_entropy(output, Y.t().reshape((-1,)), weight)
        l, _ = adaptive_softmax(output, Y.t().reshape((-1,)))
        loss = -l.reshape((-1, batch_size)).t() * mask
        loss = loss.sum(dim=1) / mask.sum(dim=1)
        loss = loss.mean()
        loss.backward()
        norm = nn.utils.clip_grad_norm_(model.parameters(), 1e-2)
        optimizer.step()
        total_loss += loss.item()

谢谢您,想不到昨天就收到您的回复了,我是从博客那边过来的。

jwc19890114 avatar Jun 13 '20 05:06 jwc19890114