some questions about preprocess_elect.py and data_loader.py
Hi,
I have a few questions about preprocess_elect.py:
-
in prep_data():
v_input[:, 1]is never used (read or write), so why you need this 2nd column? https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L35 -
about x_input: https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L58 x_input[count, 1:, 0] from 1 onward, x_input contains the real raw input data, but
x_input[count, 0, 0]is never assigned, so it will remain all 0s, which means it does not contain any real raw input data (https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L67 this line for x_input[count, 0, 0] is also zero) why don't you just drop all suchx_input[:, 0, :], since they are the wrong training data? and why you want to save it in the final train npy file? i.e. change https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L72-L74 to
np.save(prefix+'data_'+save_name, x_input[:, 1:, :])
np.save(prefix+'v_'+save_name, v_input[1:, :])
np.save(prefix+'label_'+save_name, label[1:, :])
and I did some inspection of the saved train data, it's confirmed that they are all 0s:
>>> import numpy as np
>>> t = np.load("data/elect/train_data_elect.npy")
>>> np.max(t[:, 0, 0])
0.0
>>> np.min(t[:, 0, 0])
0.0
>>>
@Zhazhan
my 3rd question:
https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L58
x_input[count, 1:, 0] = data[window_start:window_end-1, series]
so, the x_input[:, :, 0] is the raw input sequence data,
but in: https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L440-L445
cov = all_data[:, :, 2:] # the raw input sequence data is dropped here?
split_start = len(label[0]) - self.pred_length + 1
data, label = split(split_start, label, cov, self.pred_length)
return data, label
it's dropped from the training data?
This is the same question I have here: https://github.com/ant-research/Pyraformer/issues/25#issuecomment-1509923168
So the previous value of the raw input sequence value is not used at all in training?
ok, for my question 3), I found:
https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L443
data, label = split(split_start, label, cov, self.pred_length)
which on https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L398-L403
single_data = batch_label[i:(split_start+i)].clone().unsqueeze(1)
single_data[-1] = -1
single_cov = cov[batch_idx, i:(split_start+i), :].clone()
temp_data = [single_data, single_cov]
single_data = torch.cat(temp_data, dim=1)
all_data.append(single_data)
insert the label (as previous values in the window) back into the all_data. This is confusing, why you choose to do it this way?
Also, the implementation of electTrainDataset.__getitem__
https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L432
is so different from electTestDataset.__getitem__
https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L460
in particular https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L473-L477
single_data = data[i:(split_start+i)].clone().unsqueeze(1)
single_data[-1] = -1
single_cov = cov[i:(split_start+i), :].clone()
single_data = torch.cat([single_data, single_cov], dim=1)
all_data.append(single_data)
Here, you didn't do the same to insert the label (as previous values in the window) back into all_data, why there is such difference?