sru speed

speed

Open MuyuXiaoxiang opened this issue 5 years ago • 2 comments

I am sorry to disturb you.I don't know why the training time of LSTM and SRU is the same. The code is like this. The dataset is simple and small. I want to use it to do experiment about time series data prediction. Is it because of the dataset is too small? Is there something that I should notice? Thank you for your reply!

`class net(nn.Module):

def __init__(self, input_size, hidden_size, num_layers, batch_size):
    super(net, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.num_layers = num_layers
    #self.batch_size = batch_size
    #self.lstm = nn.LSTM(input_size, hidden_size, num_layers)
    self.lstm = SRU(input_size, hidden_size, num_layers)
    self.fc = nn.Linear(hidden_size, 1)`

` net = net(input_size, hidden_size, layer_nums, batch_size)

net = net.cuda()
epoch = 2000
criterion = nn.MSELoss()  # 损失函数
optimizer = optim.SGD(net.parameters(), lr = 0.05)
train_loss = []
test_loss = []
start_time = datetime.datetime.now()
for i in range(epoch):
    for start, end in zip(range(0, len(x_train), batch_size), range(batch_size, len(x_train) + 1, batch_size)):
        x = torch.from_numpy(x_train[start:end])
        y = torch.from_numpy(y_train[start:end]).cuda()
        input = Variable(x).cuda()
        output = Variable(y).cuda()
        out = net(input)
        loss = criterion(out, y)
        optimizer.zero_grad()  # 梯度归零
        loss.backward()
        optimizer.step()
end_time = datetime.datetime.now()
total_time = (end_time - start_time).seconds`

Nov 26 '18 04:11 MuyuXiaoxiang

Hi, sorry for the late reply.

Did you check the GPU usage (e.g. nvidia-smi) while running the code? I suspect the usage is small and the bottleneck is IO instead of computation.

The following lines of your code are not optimal:

        x = torch.from_numpy(x_train[start:end])
        y = torch.from_numpy(y_train[start:end]).cuda()
        input = Variable(x).cuda()
        output = Variable(y).cuda()

Moving data from CPU memory to GPU memory is expensive. Perhaps try create the data as CUDA tensors outside the for loops? You can take a look at the classification and language model examples: https://github.com/taolei87/sru/blob/master/classification/train_classifier.py#L98 https://github.com/taolei87/sru/blob/master/language_model/train_lm.py#L133-L135 https://github.com/taolei87/sru/blob/master/language_model/train_lm.py#L30-L34

Dec 09 '18 03:12 taolei87

ok！Thank you for your reply!

Dec 10 '18 03:12 MuyuXiaoxiang

sru sru copied to clipboard

speed

sru
sru copied to clipboard