J-shang

Results 5 issues of J-shang

### Description ### Use parallel operations instead of loops in `BankSparsityAllocator` #### Test Options #### - [ ] fast test - [ ] full test - HPO - [ ]...

### Description ### #### Test Options #### - [ ] fast test - [ ] full test - HPO - [x] full test - NAS - [ ] full test...

### Description ### Example: - End2End pruning example for `bert` on MNLI dataset and report a few experiments result. Enhancements: - MovementPruner: (Experimental) Support `soft` movement pruning (only support `hard`...

### Description ### fix #4875 - [x] fix infinite loop in `ChannelDependency._get_parent_layers` - [ ] fix buffer can not be traced? - https://github.com/facebookresearch/detectron2/blob/45b3fcea6e76bf7a351e54e01c7d6e1a3a0100a5/detectron2/modeling/anchor_generator.py#L177 - the following `view` in `base_anchors.view(1, -1,...

# What does this PR do? A follow up pr after #17968 If the attention layer `self.has_relative_attention_bias == False`, then the position bias shape will be wrong, the head number...