ViTAS icon indicating copy to clipboard operation
ViTAS copied to clipboard

The number of head

Open December-boy opened this issue 3 years ago • 10 comments

Thanks for your nice work. I wonder the heads set in the Attention is None, is this means the heads are set to 4 in the supernet? As listed in the paper, the heads are selected in {3, 6, 12, 16}.
image

December-boy avatar Jul 21 '21 08:07 December-boy

Thanks for your issue. To implement the training and search of supernet, we need to set the head number of each batch. Therefore, the first "head_dim" is only used to compute the scale value (self.scale), where the "self.heads = None" is leveraged during the forward process (please refer to core/model/net.py). According to the selected architecture, the value of "self.heads" is changed for each batch data.

xiusu avatar Jul 21 '21 09:07 xiusu

Thanks for you reply

December-boy avatar Jul 21 '21 09:07 December-boy

One more thing, I wonder how to set the selected architecture, since the init supernet is like a block-level search space, each block is the same with 4 heads and 1440 output dimension.

December-boy avatar Jul 21 '21 09:07 December-boy

One more thing, I wonder how to set the selected architecture, since the init supernet is like a block-level search space, each block is the same with 4 heads and 1440 output dimension. Oh, I figure out it.

December-boy avatar Jul 21 '21 09:07 December-boy

Thanks for your question. To implement the retraining process of a searched architecture, you can refer to config/retrain/ViTAS_1G_retrain.yaml. As in lines 82 and 122, the "net_id" defines the retrained architecture with the pre-set search space (78-81 lines). Moreover, you can also use a pre-defined model as the retrained architecture, as in config/retrain/ViTAS_1.3G_retrain.yaml (80-83 lines); with this setting, you can directly train your defined architecture and do not need to use the "net_id" in your yaml.

xiusu avatar Jul 21 '21 09:07 xiusu

Thanks for your question. To implement the retraining process of a searched architecture, you can refer to config/retrain/ViTAS_1G_retrain.yaml. As in lines 82 and 122, the "net_id" defines the retrained architecture with the pre-set search space (78-81 lines). Moreover, you can also use a pre-defined model as the retrained architecture, as in config/retrain/ViTAS_1.3G_retrain.yaml (80-83 lines); with this setting, you can directly train your defined architecture and do not need to use the "net_id" in your yaml.

Thanks! Can you tell me the cost of the search or the whole process? What type of GPU was used, how many GPUs and how many days did it take?

December-boy avatar Jul 22 '21 12:07 December-boy

Thanks for your question. I leveraged 32 X V100 cards with 32G GPU RAM each to implement the search.

xiusu avatar Jul 22 '21 12:07 xiusu

It takes about 2-3 days for searching an ViT architecture.

xiusu avatar Jul 22 '21 12:07 xiusu

It takes about 2-3 days for searching an ViT architecture.

I‘ve trained the supernet, and the sampled results is strange. As shown in the followed figure, the test results is very low. Is that a normal situation? image

December-boy avatar Aug 05 '21 03:08 December-boy

Yes, during sampling, the accuracy of ViT architecture is relatively low in supernet.

xiusu avatar Aug 05 '21 03:08 xiusu