Ross Wightman comments

Results 510 comments of


                                            Ross Wightman

swin v2 adding padding to shifted window attention breaks the algorithm

@alita-moore okay, thanks for clarifying, makes more sense. It would also be possible to check with the normal pretrained weights, they'll still validate reasonably if the sizes that force the...

swin v2 adding padding to shifted window attention breaks the algorithm

shouldn't the attention mask be adjusted in either case? feel ensuring the mask makes sense is more important than pad before vs after shift, as both will alter the validity...

Add support for visheratin/mexma-siglip2

@maxlund looks like it might be possible to add with a model config and no code changes, though not quite sure about the text pooling https://huggingface.co/visheratin/mexma-siglip2/blob/cc7d73c89f452514ec3b996528543d026ad58f72/mexma_siglip.py#L70-L76 It's an XLMRoberta text...

Coca Image-to-Text, RuntimeError: Boolean value of Tensor with more than one value is ambiguous

@sky-cake I just merged PR from @MengqingCao to this main branch that should address this issue

Coca Image-to-Text, RuntimeError: Boolean value of Tensor with more than one value is ambiguous

K, have a PR ready ... device mismatch on the token passed to stopping criteria fixed. Also, don't think the any() logic that was originally put in there in the...

[FEATURE] Refactor ReadMe

@ParagEkbote README could use some tweaking but not aiming to make it cookie cutter of the others. There is a TOC for quick jumping. Right now it's not a priority...

[FEATURE] timm.models.adapt_input_conv: beyond RGB weights

@adamjstewart So if you had a model with 13 channels, you might want to convert it so those 3 channels are repeated several times and pass it say a 13*4=52...

[FEATURE] timm.models.adapt_input_conv: beyond RGB weights

@adamjstewart k, I think it's pretty straightforward to support that with an extra arg that covers the 'base' or default channels. Below I added base_chans arg... if you set it...

[FEATURE] timm.models.adapt_input_conv: beyond RGB weights

@adamjstewart removing the base_chans arg would require adding a space2depth multiplier arg to resolve that ambiguity to support monochrome use with tresnet models...

[FEATURE] Support for RWKV and MAMBA architecture

@2catycm they are both interesting model families, the problem is they all require custom kernels (or external libraries with custom kernels). Those have proven to be difficult to maintain over...