Ross Wightman
Ross Wightman
@tungdq212 I for one have only tested the webdataset datasets in multi-gpu settings (and on a local 3x GPU machine there has been no issue with cc3m/12m tests in that...
@Jeadie re the 403 forbidden in the PR run, I assume that will succeed once merged, or is there any token/secret that needs to be added
@gpucce @vturrisi the seq len of the output of the attention pooler is determined by the size of the latent query, that’s 256 for CoCa in current config, this is...
@gpucce to match the paper, looking at it, it’s not a matter of making it 257, it’s a matter of needing two separate poolers, one for contrastive token w/ n=1,...
@gpucce hmm, yeah, that’s a pickle, it obv still works fairly well (not abnormal in the wonderful world of DL / AI), we should add support for two poolers of...
I don't see how they'd be equivalent with the softmax there... On Mon, Mar 6, 2023, 3:17 PM Giovanni Puccetti ***@***.***> wrote: > @rwightman ok, indeed I was thinking about...
@gpucce right yeah, chunking q is fine and done in other situations so this should be as well … mixed up my dim for the softmax. But yeah, the projections...
@lucidrains awesome, we should probably put this audio specific stuff in a new file, was thinking of splitting the other sub-transformers at some point too ... audio_transformer.py ?
yeah was thinking modality, leave base transformer as the parent, and split off modality specific transformers, at least in this case audio since it's new, can split the others later...
@orchidmajumder good point, I believe that arg is also required if we want to use torch.compile with FSDP, at least in its current state