Kim Seonghyeon
Kim Seonghyeon
@SURABHI-GUPTA 1. jpg will be load faster than png (due to small sizes, decoding steps, ...) You can use pngs. 2. Also it is for speed up data loading times....
I think there are some problems between multiprocessing and lmdb library...I think you can fix it temporarily by set num_workers=1 in DataLoader.
As in linux it is not very problematic to set it larger than actual data sizes...but maybe it is different in windows.
This implementation implementa hierarchical vq-vae. It is in model implementations. https://github.com/rosinality/vq-vae-2-pytorch/blob/4d2dbc0e073f033675843225dd7436550f9d6a47/vqvae.py#L164
Sorry for late reply. I haven't tried this model on audio domain, but I suspect that data normalization and preprocessings are crucial for log melspectrogram as this model doesn't have...
I didn't saw that kind of the problems. Both distributed or single gpu training results similar results I think.
Sorry for late reply. Currently this implementation does not support class conditional generation. Some modification will be needed like injecting conditions in top pixelsnail networks.
Yes, you can use for that as pixelsnail itself was used for natural images. Also you can use conditions on it, and actually pixelsnail for bottom code is conditioned on...
Yes. You can try some methods to incorporate it as conditions.
There are many options. You can use conditioning mechanism in current implementation. You only need to transform your signal vector to spatial (2d, NCHW) feature maps. (You can simply use...