Austin
Austin
@Rayhane-mamah, I'm glad you got your project working. I was also going to suggest tuning your learning rate or using cyclical learning rates. Since the paper didn't give weight init...
That's an interesting interpretation, I didn't think about that and without the authors code to refer to we can truly only guess. With regards to the results, I haven't written...
Welcome @fatchord. I originally thought the same, but ended up following the equations in the bengio paper. I then realized that the attention weights are cumulative since we add the...
They predict multiple frames in the first paper I believe. Though it was my understanding that this one is meant to be faster. Granted, this is a google paper that's...
Right, the mixture of logistic encoding is from pixelcnn++. I just built a baby wavenet that’s generating sine waves right now. It seems I have to add local conditioning. My...
Thanks for the pointers with regards to wavenet the diagrams from the original paper led me to believe that the kernel size should always be 2! Ironically, the parallel wavenet...
Ya, the fast queues only increase generation time in proportion to the number of layers in your network and they mentioned it isn't much faster (2x maybe) unless you're using...
In concept, it looks correct except that I use the last layers hidden state only as the query vector, as is common in nmt. as a warning I’m not familiar...
I think it’s better to ignore the padding since it won’t always be available as a feature, it’s just an artifact of wanting to batch inputs. Learning to use padding...
Plots at 4k steps, you can see that for each frame, it doesn't put weight on anything past the end token. Except in cases where there's silence (it doesn't seem...