torchvggish
torchvggish copied to clipboard
GPU version support?
Thank you for your work, I would like to ask if you can add GPU support options in torch.hub?Another question is whether the obtained embedding_size must be a fixed value of 128, is there a way to convert to 2048 dimensions?
I think you should be able to do model.to('cuda')
to convert the model to cuda.
The model itself is pretty simple, so you should be able to load the pre-trained weights without the last layer. But that requires some manual work with forking this repo.
vggish strictly extracts features every 0.96 seconds, but my image features extract features every 1s. Do you have a good way to align features and look forward to your suggestions?
You should be able to just crop the 1 second audio to .96 seconds
I'm sorry. I may have described the problem. For example, my video is half an hour. I select one frame of image every second to extract the image features after rensnet18, and the audio features are vggish. But I found that the dimension of the image is [30 * 60,512], but the audio feature test [30 * 60 / 0.96,128]. I want to align features in the time dimension. What should I do?
I found that 4 seconds video does not have this problem. because [4,512] == [4/0.96,128].Any suggestion is welcome,thx very much.
@leemengxing this repo is just for the port of vggish to pytorch. I suggest you ask this question on https://groups.google.com/forum/#!forum/audioset-users - you're more likely to have a useful response from those guys 😄 I'm not really sure how to help with that particular problem other than to crop the audio per second to 0.96 like @stevenguh suggested.
As the GPU support has been resolved upthread, I'm closing this issue now. Thanks.
I know this is closed, but when I try to send the model to cuda using model.cuda()
, pytorch will throw me RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
. I solved this by adding the following code to VGGish.forward in vggish.py:
def forward(self, x, fs=None):
if self.preprocess:
x = self._preprocess(x, fs)
# start added code
if next(self.parameters()).is_cuda:
x = x.cuda()
# end added code
x = VGG.forward(self, x)
if self.postprocess:
x = self._postprocess(x)
return x
It's not the most elegant solution, but I am just checking if the model weights are cuda and if so changing the data to such. From my tests so far it seems to work, but please let me know if there is something wrong with this.
@botkevin nothing wrong with that if it works! However I have realised that the offending line is: https://github.com/harritaylor/torchvggish/blob/e1e22734d7cff5fef0cd11bbfa631a2ae0b21123/torchvggish/vggish.py#L148. There is a way to serialise weights to cuda automatically afaik. I will try to fix this issue later today. Thanks for raising it!
https://github.com/harritaylor/torchvggish/pull/19
Sending the model to GPU works fine but PyTorch will complain RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
unless the audio tensor is also sent to GPU.
That said, the speedup is not dramatic because most of the time is spent in pre-processing. For a 2 second audio clip that I tested on CPU, 70 milliseconds were spent on pre-processing the audio file into an array of spectrogram patches, and 20 milliseconds were spent on inference itself.
Hi,
You guys can check my configuration based on v0.1 at https://github.com/nhattruongpham/torchvggish-gpu
That worked for me because I had converted the PCA params tensor to cuda.
Good luck!