DrummerNet
DrummerNet copied to clipboard
Implementation details
Thanks a lot for the great work.
I still have a few questions on implementation details. First, what is the reason for partitioning the training procedure with powers of 2 ?
Second, I am confused with normalization. For the source dataset, you use the maximum absolute value normalization while for the sample dataset you use a scaling with lambda x: 10 * 1.0 / x.pow(2).sum().sqrt().
Can you give more insight on this choice ?
Third, why did you choose to pad your sequences with small noise rather than zeros ?