Phil Wang comments

Results 814 comments of


Phil Wang

add audio spectrogram transformer, and full audio clip

will still need to add the functions for generating from cfg as well as the full `AudioClip` perhaps by modality is good

add audio spectrogram transformer, and full audio clip

You got it, will make the changes next week

add audio spectrogram transformer, and full audio clip

Have a bunch of meetings with people around the valley this week, I'll get around to finishing this next week

add audio spectrogram transformer, and full audio clip

@lukewys @RetroCirce Hello Yusong and Ke! Thank you so much for offering your audio expertise; it is more helpful than you realize The hyperparameters that I am unsure about are...

add audio spectrogram transformer, and full audio clip

also, decided to keep a lot of the `image` in there, in case there is a lot of logic in the library using the `encode_image` or accessing `.visual`. we are...

add audio spectrogram transformer, and full audio clip

@marianna13 oh hey Marianna! good to hear from you yes, it should be able to accept spectrograms (you just have to pass in a tensor of shape `batch, freqs, time`)

add audio spectrogram transformer, and full audio clip

@marianna13 can you make sure the following code can run ```python import torch from src.open_clip import AudioCLIP, CLIPAudioCfg, CLIPTextCfg mulan = AudioCLIP( embed_dim = 512, audio_cfg = CLIPAudioCfg(), text_cfg =...

add audio spectrogram transformer, and full audio clip

@marianna13 ohh, what is the shape of the input tensor you are passing in? i thought spectrograms only have 1 channel, but i am not really an audio expert

add audio spectrogram transformer, and full audio clip

@marianna13 i can make it accommodate 3 channels, if that is the case

add audio spectrogram transformer, and full audio clip

@marianna13 ```python import torch from src.open_clip import AudioCLIP, CLIPAudioCfg, CLIPTextCfg mulan = AudioCLIP( embed_dim = 512, audio_cfg = CLIPAudioCfg(channels = 3), text_cfg = CLIPTextCfg(), ) spectrogram = torch.randn(2, 3, 32,...