Phil Wang

Results 814 comments of Phil Wang

will still need to add the functions for generating from cfg as well as the full `AudioClip` perhaps by modality is good

You got it, will make the changes next week

Have a bunch of meetings with people around the valley this week, I'll get around to finishing this next week

@lukewys @RetroCirce Hello Yusong and Ke! Thank you so much for offering your audio expertise; it is more helpful than you realize The hyperparameters that I am unsure about are...

also, decided to keep a lot of the `image` in there, in case there is a lot of logic in the library using the `encode_image` or accessing `.visual`. we are...

@marianna13 oh hey Marianna! good to hear from you yes, it should be able to accept spectrograms (you just have to pass in a tensor of shape `batch, freqs, time`)

@marianna13 can you make sure the following code can run ```python import torch from src.open_clip import AudioCLIP, CLIPAudioCfg, CLIPTextCfg mulan = AudioCLIP( embed_dim = 512, audio_cfg = CLIPAudioCfg(), text_cfg =...

@marianna13 ohh, what is the shape of the input tensor you are passing in? i thought spectrograms only have 1 channel, but i am not really an audio expert

@marianna13 i can make it accommodate 3 channels, if that is the case

@marianna13 ```python import torch from src.open_clip import AudioCLIP, CLIPAudioCfg, CLIPTextCfg mulan = AudioCLIP( embed_dim = 512, audio_cfg = CLIPAudioCfg(channels = 3), text_cfg = CLIPTextCfg(), ) spectrogram = torch.randn(2, 3, 32,...