Ross Wightman
Ross Wightman
@alita-moore okay, thanks for clarifying, makes more sense. It would also be possible to check with the normal pretrained weights, they'll still validate reasonably if the sizes that force the...
shouldn't the attention mask be adjusted in either case? feel ensuring the mask makes sense is more important than pad before vs after shift, as both will alter the validity...
@maxlund looks like it might be possible to add with a model config and no code changes, though not quite sure about the text pooling https://huggingface.co/visheratin/mexma-siglip2/blob/cc7d73c89f452514ec3b996528543d026ad58f72/mexma_siglip.py#L70-L76 It's an XLMRoberta text...
@sky-cake I just merged PR from @MengqingCao to this main branch that should address this issue
K, have a PR ready ... device mismatch on the token passed to stopping criteria fixed. Also, don't think the any() logic that was originally put in there in the...
@ParagEkbote README could use some tweaking but not aiming to make it cookie cutter of the others. There is a TOC for quick jumping. Right now it's not a priority...
@adamjstewart So if you had a model with 13 channels, you might want to convert it so those 3 channels are repeated several times and pass it say a 13*4=52...
@adamjstewart k, I think it's pretty straightforward to support that with an extra arg that covers the 'base' or default channels. Below I added base_chans arg... if you set it...
@adamjstewart removing the base_chans arg would require adding a space2depth multiplier arg to resolve that ambiguity to support monochrome use with tresnet models...
@2catycm they are both interesting model families, the problem is they all require custom kernels (or external libraries with custom kernels). Those have proven to be difficult to maintain over...