transformers
transformers copied to clipboard
DPT implementation contains unused parameters
System Info
Kind of irrelevant, but:
- `transformers` version: 4.40.0
- Platform: macOS-14.4.1-arm64-arm-64bit
- Python version: 3.9.16
- Huggingface_hub version: 0.20.3
- Safetensors version: 0.4.2
- Accelerate version: 0.22.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.3.0 (False)
- Tensorflow version (GPU?): 2.13.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: n/a
- Using distributed or parallel set-up in script?: yes
Who can help?
The first (zeroth) fusion layer is never used and causing issues like when run on DDP:
Parameters which did not receive grad for rank 3: neck.fusion_stage.layers.0.residual_layer1.convolution2.bias,
neck.fusion_stage.layers.0.residual_layer1.convolution2.weight, neck.fusion_stage.layers.0.residual_layer1.convolution1.bias,
neck.fusion_stage.layers.0.residual_layer1.convolution1.weight
@amyeroberts
Information
- [X] The official example scripts
- [X] My own modified scripts
Reproduction
We take the code from DPT doc page https://huggingface.co/docs/transformers/main/en/model_doc/dpt
make forward-backward pass and check unused parameters
import torch
from transformers import Dinov2Config, DPTConfig, DPTForDepthEstimation
# initialize with a Transformer-based backbone such as DINOv2
# in that case, we also specify `reshape_hidden_states=False` to get feature maps of shape (batch_size, num_channels, height, width)
backbone_config = Dinov2Config.from_pretrained("facebook/dinov2-base", out_features=["stage1", "stage2", "stage3", "stage4"], reshape_hidden_states=False)
config = DPTConfig(backbone_config=backbone_config)
model = DPTForDepthEstimation(config=config)
out=model(torch.rand(1,3,512,512))
loss = out.predicted_depth.mean()
loss.backward()
for n, p in model.named_parameters():
if p.grad is None:
if 'backbone' in n: continue # part of backbone is not used and that is fine
print(f"found unused param, {n}")
Result:
found unused param, neck.fusion_stage.layers.0.residual_layer1.convolution1.weight
found unused param, neck.fusion_stage.layers.0.residual_layer1.convolution1.bias
found unused param, neck.fusion_stage.layers.0.residual_layer1.convolution2.weight
found unused param, neck.fusion_stage.layers.0.residual_layer1.convolution2.bias
This prevents DDP training. To fix that, one should add line into DPT model
self.neck.fusion_stage.layers[0].residual_layer1 = None
Here is the same fix in the mmsegmentation
https://github.com/open-mmlab/mmsegmentation/blob/main/mmseg/models/decode_heads/dpt_head.py#L271
self.fusion_blocks[0].res_conv_unit1 = None
I can submit a PR which makes this fix
Expected behavior
Not have unused parameters.
Hi @ducha-aiki, thanks for reporting!
You are right, it looks like we can safely delete layers[0].residual_layer1 from DPTFeatureFusionStage because its never used.
Would you mind sharing why this prevents DDP training?
@qubvel I believe I shared this in:
Parameters which did not receive grad for rank 3: neck.fusion_stage.layers.0.residual_layer1.convolution2.bias, neck.fusion_stage.layers.0.residual_layer1.convolution2.weight, neck.fusion_stage.layers.0.residual_layer1.convolution1.bias, neck.fusion_stage.layers.0.residual_layer1.convolution1.weight
That is a quote from the error crash message I am getting, when running with accelerate for multi-GPU, when I specify in Trainer ddp_find_unused_parameters=False.
Thank you, I missed it 🙂 I am trying to understand why backbone unused weights are not blocking, while neck's block. Did you try training with a fix? Anyway if this solves the issue it is worth a PR.
@qubvel good point about the backbone. Probably because I have trained with a frozen backbone, which is kind of common. And about the backbone removing unused params there would probably required too much changes. I will do a PR then, thanks.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.