torchvggish icon indicating copy to clipboard operation
torchvggish copied to clipboard

GPU version support?

Open leemengxing opened this issue 4 years ago • 10 comments

Thank you for your work, I would like to ask if you can add GPU support options in torch.hub?Another question is whether the obtained embedding_size must be a fixed value of 128, is there a way to convert to 2048 dimensions?

leemengxing avatar Feb 27 '20 05:02 leemengxing

I think you should be able to do model.to('cuda') to convert the model to cuda.

The model itself is pretty simple, so you should be able to load the pre-trained weights without the last layer. But that requires some manual work with forking this repo.

stevenguh avatar Feb 27 '20 17:02 stevenguh

vggish strictly extracts features every 0.96 seconds, but my image features extract features every 1s. Do you have a good way to align features and look forward to your suggestions?

leemengxing avatar Mar 02 '20 04:03 leemengxing

You should be able to just crop the 1 second audio to .96 seconds

stevenguh avatar Mar 02 '20 19:03 stevenguh

I'm sorry. I may have described the problem. For example, my video is half an hour. I select one frame of image every second to extract the image features after rensnet18, and the audio features are vggish. But I found that the dimension of the image is [30 * 60,512], but the audio feature test [30 * 60 / 0.96,128]. I want to align features in the time dimension. What should I do?

leemengxing avatar Mar 02 '20 19:03 leemengxing

I found that 4 seconds video does not have this problem. because [4,512] == [4/0.96,128].Any suggestion is welcome,thx very much.

leemengxing avatar Mar 02 '20 19:03 leemengxing

@leemengxing this repo is just for the port of vggish to pytorch. I suggest you ask this question on https://groups.google.com/forum/#!forum/audioset-users - you're more likely to have a useful response from those guys 😄 I'm not really sure how to help with that particular problem other than to crop the audio per second to 0.96 like @stevenguh suggested.

As the GPU support has been resolved upthread, I'm closing this issue now. Thanks.

harritaylor avatar Mar 03 '20 12:03 harritaylor

I know this is closed, but when I try to send the model to cuda using model.cuda(), pytorch will throw me RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same. I solved this by adding the following code to VGGish.forward in vggish.py:

def forward(self, x, fs=None):
    if self.preprocess:
        x = self._preprocess(x, fs)
    # start added code
    if next(self.parameters()).is_cuda:
        x = x.cuda()
    # end added code
    x = VGG.forward(self, x)
    if self.postprocess:
        x = self._postprocess(x)
    return x

It's not the most elegant solution, but I am just checking if the model weights are cuda and if so changing the data to such. From my tests so far it seems to work, but please let me know if there is something wrong with this.

botkevin avatar Aug 08 '20 00:08 botkevin

@botkevin nothing wrong with that if it works! However I have realised that the offending line is: https://github.com/harritaylor/torchvggish/blob/e1e22734d7cff5fef0cd11bbfa631a2ae0b21123/torchvggish/vggish.py#L148. There is a way to serialise weights to cuda automatically afaik. I will try to fix this issue later today. Thanks for raising it!

harritaylor avatar Aug 10 '20 14:08 harritaylor

https://github.com/harritaylor/torchvggish/pull/19

Sending the model to GPU works fine but PyTorch will complain RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same unless the audio tensor is also sent to GPU.

That said, the speedup is not dramatic because most of the time is spent in pre-processing. For a 2 second audio clip that I tested on CPU, 70 milliseconds were spent on pre-processing the audio file into an array of spectrogram patches, and 20 milliseconds were spent on inference itself.

dfan avatar Sep 14 '20 20:09 dfan

Hi,

You guys can check my configuration based on v0.1 at https://github.com/nhattruongpham/torchvggish-gpu

That worked for me because I had converted the PCA params tensor to cuda.

Good luck!

nhattruongpham avatar Sep 28 '22 18:09 nhattruongpham