seq2seq-signal-prediction icon indicating copy to clipboard operation
seq2seq-signal-prediction copied to clipboard

Some ideas to improve this project

Open 0b01 opened this issue 7 years ago • 5 comments

I am a seq2seq beginner so these are just my 2 cents. Correct me if I'm wrong.

  • Variable length output
  • Add weights to the loss function. e.g. the first few predicted points have bigger weights.
  • Dropout so it can better deal with noisy channels
  • ARIMA-like confidence interval (kind of like logits from softmax in the discrete case)
  • Use the new tf.contrib.tensorflow attention decoder

That's all.

0b01 avatar Jun 14 '17 03:06 0b01

  • swap the loss function to NRMSE

0b01 avatar Jun 14 '17 04:06 0b01

https://github.com/guillaume-chevalier/seq2seq-signal-prediction/commit/1b1701386cb1e40ecde3b8e7a5e7ae08cc256b3b#commitcomment-22530082

0b01 avatar Jun 14 '17 05:06 0b01

The error function should be mean absolute percentage error:

https://en.wikipedia.org/wiki/Mean_absolute_percentage_error

0b01 avatar Jun 17 '17 18:06 0b01

Hi @rickyhan,

I think your ideas are nice and that it would be worth trying! I really appreciate your suggestions. Here are my comments on that:

  • Variable length output:
    • I would love it if someone implemented that. There's an issue for that: https://github.com/guillaume-chevalier/seq2seq-signal-prediction/issues/1
  • Add weights to the loss function. e.g. the first few predicted points have bigger weights.
    • Good idea! I have thought a lot about using an exponential decay on the loss function, but I never did it.
  • Dropout so it can better deal with noisy channels
    • This is not a priority, but it would be interesting. Having dropout would have considerably slowed the training phase. Since this project was an interactive demo at a master class / workshop / conference, training with dropout would have been too slow. But now it would be O.K.
  • ARIMA-like confidence interval (kind of like logits from softmax in the discrete case)
    • I wonder how one could implement that for a seq2seq. Maybe by tampering randomly with the state and inputs at each decoding steps to then record many randomized decoding passes to then have a neat Gaussian discrete distribution for each time step? Another way to do that would be to use a Mixture Density Networks (MDN) RNN, such as here: https://github.com/zhaoyu611/basketball_trajectory_prediction
  • Use the new tf.contrib.tensorflow attention decoder
    • I already thought about using attention for predicting time series, however, I am tempted to think that it would not help.
  • swap the loss function to NRMSE
    • I never tried nor thought to use this as a loss function for optimization, because MSE seems to be very standard and used. I am curious about whether or not this would be good. Yet, I recall that optimizing on the absolute error would optimize for the median, while optimizing on the squared error (such as in MSE) would optimize for the mean. For RMSE, I have no clues, but the loss function would appear less convex which might be less good. At least, trying that empirically is simple with the actual code. And for normalizing such as in NRMSE, this seems interesting too. At least, the outputs are already scaled according to how the inputs were scaled by their mean and std, so the outputs should not diverge too much from there, at least.
  • Replace enc_inp by expected_sparse_output: https://github.com/guillaume-chevalier/seq2seq-signal-prediction/commit/1b1701386cb1e40ecde3b8e7a5e7ae08cc256b3b#commitcomment-22530082
    • I tried to implement it, but it seems not trivial to implement because at test time we would rather use the predictions themselves (with feedback) rather than spoiling the decoder with the true test values. This would enable it to recover at test time. There is maybe a simple way to code this, I would like to have feedback on that. It might be possible to get some inspiration from this code, but right now I don't have the time for that: https://github.com/tensorflow/models/tree/master/tutorials/rnn/translate
  • The error function should be mean absolute percentage error
    • Thanks for the hint! Yeah, we could use that as an error metric aside from the optimization objective, loss.

I'll create a CONTRIBUTING.md file right now for instructions on how to contribute, in case anyone is interested.

guillaume-chevalier avatar Jun 17 '17 23:06 guillaume-chevalier

Give a solution for each exercise

nicolapiccinelli avatar May 02 '18 13:05 nicolapiccinelli