Squeeze function called to reduce tensor dimensions on an input with non singleton dimensions.

Open ThorJonsson opened this issue 5 years ago • 1 comments

Hi! Thank you for this nice collection of code to work on this interesting problem.

Unfortunately I'm having some difficulties reproducing your results. In particular I am not sure how you preprocess your audio data. In some places you use multiple frame sizes. This is important because depending on the preprocessing method you get different shapes for the input data for the model. The TCN_Davies model uses a series of 2d convolutions before squeezing away the last dimension of the input and feeding the result to a 1d CNN. However, my input data has the wrong shape and thus I feed a 4 dimensional tensor to the TCN because the squeeze function doesn't do anything.

https://github.com/julius-richter/beat_tracker/blob/b5b99718650f9e2be789133618c1efb32c731ba7/python/models.py#L45

I am trying to use the model according to instructions here https://github.com/julius-richter/beat_tracker/blob/master/jupyter/process.ipynb

Hope you can help,

Best,

Thor

Mar 04 '20 20:03 ThorJonsson

Hey Thor,

first of all, thanks for your message and sorry for my late reply. Your message somehow disappeared in my mail box.

The github repository was more a version control for myself than a collection of code for others. However, I think it would be nice to make to code understandable for everybody! I cleaned up the "process"-notebook, maybe things become clearer now :)

To answer your question, I would need to look into the code more closely since I wrote it some time ago. Maybe for now it is helpful if you have a look into my Master thesis: https://www2.ak.tu-berlin.de/~akgroup/ak_pub/abschlussarbeiten/2019/Richter_MasA.pdf

Best, Julius

Apr 05 '20 09:04 julius-richter