Ross Wightman

Results 522 comments of Ross Wightman

@mehdidc the position, token class embeddings are typically not decayed as well but looks like that was never done in OpenCLIP, hrmm. I have `no_weight_decay` methods in timm that return...

On the other topic, I feel it's fine to support nightlies only, I'm already exclusively using nightlies to train the convnext models because it's the only way to get decent...

@laserprec looks like you've moved on from MS, but might know who to ping there if there's any interest in keeping this project alive. Thanks

@rentainhe @Bailey-24 took some time to get around to this, but on main branch, all swin v1 & v2 models support feat extraction now. Be aware they are NHWC outputs,...

Old version, need recent 0.8.x see readme On Thu, Apr 6, 2023, 10:28 AM Mojtaba ***@***.***> wrote: > When I try your code I get this: > AttributeError: 'SwinTransformer' object...

@makao007 will look at this, haven't tried w/ python 3.11 yet as there wasn't a torchvision build available, is there a 3.11 torchvision?

Code instances where this either definitely a concern, or likely (depending on ranges involved). https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/llama/modeling_llama.py#L130-L131 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/llama/modeling_llama.py#L140 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/llama/modeling_llama.py#L168 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/llama/modeling_llama.py#L195 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/examples/research_projects/bertabs/modeling_bertabs.py#L265-L266 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/codegen/modeling_codegen.py#L55-L59 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/conditional_detr/modeling_conditional_detr.py#L437-L455 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/conditional_detr/modeling_conditional_detr.py#L496-L509 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/ctrl/modeling_ctrl.py#L47-L60 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/deformable_detr/modeling_deformable_detr.py#L494-L495 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/deformable_detr/modeling_deformable_detr.py#L620-L621 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/deformable_detr/modeling_deformable_detr.py#L1542-L1543 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/deprecated/transfo_xl/modeling_transfo_xl.py#L945-L946 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/deta/modeling_deta.py#L404-L405 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/deta/modeling_deta.py#L529-L530 https://github.com/huggingface/transformers/blob/8278b1538ecc89dad8ebca510a31a86bc8645edb/src/transformers/models/deta/modeling_deta.py#L1453-L1454...

If we look at the original Llama code, this issue is avoided. https://github.com/facebookresearch/llama/blob/ef351e9cd9496c579bf9f2bb036ef11bdc5ca3d2/llama/model.py#L100-L104 ```python freqs = 1.0 / (theta ** (torch.arange(0, dim, 2)[: (dim // 2)].float() / dim)) t =...

I believe this is the problem being seen in this issue https://github.com/microsoft/DeepSpeed/issues/4932 and also seeing now this may be a dupe of https://github.com/huggingface/transformers/issues/28596

Related to this possible concern with the zero.Init() overriding dtype for arange (and I did confirm this is a problem with a test bench), there's also an overlapping issue that's...