Ross Wightman comments

Results 510 comments of


                                            Ross Wightman

Image preprocessor default when loading checkpoints

@jn2clark there are no defaults for the architecture, the arch config covers only the model. The preprocess cfg (mean/std) are part of the pretrained mappings and there is one per...

Image preprocessor default when loading checkpoints

I have thought about this a little be in context of #883 (that solution doesn't work) but could add support for saving/loading folder w/ the full config + checkpoint.

'Use this Model' code snippets for `timm` models in Transformers could use improvements

@coyotte508 thanks! my day to day does not involve any internal repos so not an hf-internal member and can't see the code there. Might be good time for me to...

'Use this Model' code snippets for `timm` models in Transformers could use improvements

So looks like the metadata does indeed have the right info. The config.jsons for timm are not Transformers though, so adding those fields doesn't make sense, it'd be more infer...

'Use this Model' code snippets for `timm` models in Transformers could use improvements

@julien-c Can use all of the timm models as image classifiers or feature extractors with transformers, including the AutoModel/AutoProcessor and pipeline APIs (https://huggingface.co/blog/timm-transformers). Also allows timm models to work with...

SigLip memory consumption increases as we scale number of GPUs

@khalidsaifullaah yeah, it's not working quite as efficiently as it should. I feel my current isend/irecv impl, while in theory should be reasonable, it appears it may not a well...

SigLip memory consumption increases as we scale number of GPUs

@khalidsaifullaah I'm experimenting with diff impl of the loss to see if any scale better in #971 ... feel free to try, feedback would be welcome

SigLip memory consumption increases as we scale number of GPUs

@khalidsaifullaah @long8v FWIW I wouldn't necessarily say no extra overhead as the world size increases is the passing criteria, I feel with gradient buffers, allocator behaviour, etc there's still likely...

swin v2 adding padding to shifted window attention breaks the algorithm

@alita-moore hmm, yeah, might be a concern, have you compared the results... force a situation where the padding is needed (it's not usually active) and then see how the accuracy...

swin v2 adding padding to shifted window attention breaks the algorithm

@alita-moore the models weren't trained with that padding. It won't be active unless you use resize inputs, set `strict_img_size=False`, `always_partition=True`, etc... these are non-standard settings to allow flexibility for some...