Ross Wightman comments

Results 497 comments of


                                            Ross Wightman

trafficstars

OpenAI CLIP -> OpenCLIP Conversion guide for H-14

@trufty one big difference to consider is that OpenAI interfaces are oriented towards inference, and this package covers both training (from scratch, from pretrained) and inference cases. We could at...

OpenAI CLIP -> OpenCLIP Conversion guide for H-14

@trufty create_model_from_pretrained is merged, and device accepts either string or torch.device now

[FEATURE] Add Next-ViT

I'm not opposed to including NexViT, but, as with @bonlime, I'm not a fan of the pooling as orginally implemented and think it should be improved. In my recent CoAtNet...

[FEATURE] Add Next-ViT

Something else to note, using the 'fast norm' options in timm now improves BCHW especially, without the upcast to float32 caused by builtin torch LayerNorm (or GN), channels_last layout will...

[BUG] CPU Memory Leakage on TPU

@zeyuwang615 I believe this might be a problematic use case for TPU right now, with Python scalars being used, updating per layer and per step like this might be triggering...

Properly get feature maps from Swin-V2 instead of norm

@sarmientoj24 unfortunately, swin v1 and v2 (adapted from the official microsoft modelling code) put the downsample at the end of blocks, so you can't simply take the output of each...

Properly get feature maps from Swin-V2 instead of norm

@sarmientoj24 it'd be worth testing your use case with `swinv2_cr_small_ns_224` to see if it works better, if that's the case I could prioritize making the other v1/v2 models available through...

Properly get feature maps from Swin-V2 instead of norm

@sarmientoj24 it was based on his impl but ended up with quite a few changes, I didn't incl the sequential attn as I wasn't convinced it was working properly... The...

Properly get feature maps from Swin-V2 instead of norm

@sarmientoj24 no, they did not release it or any of the 'really' big models that use it like giant...

Using timm gcvit models and getting errors when using resolution other than 224x224

@sarmientoj24 they are fixed resolution models due to the position embedding and window size/shape calcs. To create model with different image size, you need to pass to model at creation...