machine-learning-practice-code
machine-learning-practice-code copied to clipboard
关于MAT模型训练的问题
抱歉又打扰您,我在用您的DEMO做训练的时候发现数据里面Y/mask/weight都被加载到了GPU中,但是模型和X没有,而且在训练的时候是CPU占用率最高。 正常情况下这种加载方式不是会报错的么,为什么在这里能够正确训练呢?我尝试把X和model都放入GPU中,好像运算时间也没有缩短。求问这是怎么回事。
for epoch in range(num_epoch):
start = time.time()
num, total_loss = 0, 0
# if epoch == 5000:
# optimizer.param_groups[0]['lr'] = lr * 0.1
data = tav_data_iterator(
corpus_indice, topics_indice, batch_size, max(length) + 1)
# hidden = model.module.init_hidden(num_layers, batch_size, hidden_dim)
weight = torch.ones(len(vocab))
weight[0] = 0
for X, Y, mask, topics in tqdm(data):
num += 1
# hidden.detach_()
if use_gpu:
# X = X.to(device)
Y = Y.to(device)
mask = mask.to(device)
# topics = topics.to(device)
# hidden = hidden.to(device)
weight = weight.to(device)
optimizer.zero_grad()
# output, hidden = model(X, topics, hidden)
output, hidden = model(X, topics)
hidden.detach_()
# l = F.cross_entropy(output, Y.t().reshape((-1,)), weight)
l, _ = adaptive_softmax(output, Y.t().reshape((-1,)))
loss = -l.reshape((-1, batch_size)).t() * mask
loss = loss.sum(dim=1) / mask.sum(dim=1)
loss = loss.mean()
loss.backward()
norm = nn.utils.clip_grad_norm_(model.parameters(), 1e-2)
optimizer.step()
total_loss += loss.item()
谢谢您,想不到昨天就收到您的回复了,我是从博客那边过来的。