transformer icon indicating copy to clipboard operation
transformer copied to clipboard

Hello, thanks for your great works, I'm confused with the dataset.

Open StarDxxx opened this issue 3 years ago • 10 comments

Hello sir, i'm confused with the dataset, can share the dataset_57M.npz or other demo dataset. I just don't know the dataset's structure.

StarDxxx avatar Apr 16 '22 05:04 StarDxxx

Hello, for the dataset used in these examples, please see #2 . The expected structure of the input data is described in the Transformer's documentation; you can implement your own dataset as long as it matches this input shape.

maxjcohen avatar Apr 25 '22 08:04 maxjcohen

Hello, for the dataset used in these examples, please see #2 . The expected structure of the input data is described in the Transformer's documentation; you can implement your own dataset as long as it matches this input shape.

Hi, I have read the doc. For the inputs and outpurs of the model, I understand those as follows: d_input and d_output are input features and output features. For example, we use PM2.0, PM5 to predict pollution level, so the d_input and d_output are 2 and 1, respectively. However, I don't understand the parameter K in Input and Output tensor with shape (batch_size, K, d_output).

chuzheng88 avatar Apr 25 '22 09:04 chuzheng88

In other word, I want to deal with a regression task, it can be described as follows: there are two features in X, and X = [[x01, x02, .., x0j], [x11, x12, ..., x0j]] there is one features in Y (labels) and Y = [y1, y2, ... , yj]. For simple, We use two sequences predict one sequence, like sin and cos funciton predictiing tan function. In this case, how should we construct dataset?

chuzheng88 avatar Apr 25 '22 09:04 chuzheng88

K is the length of the time series. In your example K=j, each batch of data should consist of inputs with shape (batch_size, j, 2) and outputs with shape (batch_size, j, 1).

maxjcohen avatar Apr 25 '22 10:04 maxjcohen

K is the length of the time series. In your example K=j, each batch of data should consist of inputs with shape (batch_size, j, 2) and outputs with shape (batch_size, j, 1).

Thanks for you reply. In this case, the parameter attention_size can be set <= K ?

chuzheng88 avatar Apr 25 '22 11:04 chuzheng88

Yes exactly !

maxjcohen avatar Apr 25 '22 12:04 maxjcohen

Yes exactly !

Hi, I used dataset X, producted by sin function , to predict Y (producted by cons function), the K was set to 12. When validating, the loss=nan. I don't konw why? Note that whole codes described as follows: image image image

chuzheng88 avatar Apr 26 '22 06:04 chuzheng88

Hi, I don't see directly where a NaN could come from, I encourage you to debug during the validation loss computation in order to see what tensor or function is malfunctioning.

maxjcohen avatar Apr 26 '22 08:04 maxjcohen

Hi, I don't see directly where a NaN could come from, I encourage you to debug during the validation loss computation in order to see what tensor or function is malfunctioning.

In fact, when network training, it's loss = nan, e.g., image

In my opinion, when loss_function = OZELoss(alpha=0.3), the training loss shouldn't is nan. But I don't understand why ?

Further more, I used compute_loss function to calculate loss when validating, as follows: image

chuzheng88 avatar Apr 26 '22 08:04 chuzheng88

Is my dataset wrong? image

chuzheng88 avatar Apr 26 '22 08:04 chuzheng88