ViTAS The number of head

Thanks for your nice work. I wonder the heads set in the Attention is None, is this means the heads are set to 4 in the supernet? As listed in the paper, the heads are selected in {3, 6, 12, 16}.

Jul 21 '21 08:07 December-boy

Thanks for your issue. To implement the training and search of supernet, we need to set the head number of each batch. Therefore, the first "head_dim" is only used to compute the scale value (self.scale), where the "self.heads = None" is leveraged during the forward process (please refer to core/model/net.py). According to the selected architecture, the value of "self.heads" is changed for each batch data.

Jul 21 '21 09:07 xiusu

Thanks for you reply

Jul 21 '21 09:07 December-boy

One more thing, I wonder how to set the selected architecture, since the init supernet is like a block-level search space, each block is the same with 4 heads and 1440 output dimension.

Jul 21 '21 09:07 December-boy

One more thing, I wonder how to set the selected architecture, since the init supernet is like a block-level search space, each block is the same with 4 heads and 1440 output dimension. Oh, I figure out it.

Jul 21 '21 09:07 December-boy

Thanks for your question. To implement the retraining process of a searched architecture, you can refer to config/retrain/ViTAS_1G_retrain.yaml. As in lines 82 and 122, the "net_id" defines the retrained architecture with the pre-set search space (78-81 lines). Moreover, you can also use a pre-defined model as the retrained architecture, as in config/retrain/ViTAS_1.3G_retrain.yaml (80-83 lines); with this setting, you can directly train your defined architecture and do not need to use the "net_id" in your yaml.

Jul 21 '21 09:07 xiusu

Thanks for your question. To implement the retraining process of a searched architecture, you can refer to config/retrain/ViTAS_1G_retrain.yaml. As in lines 82 and 122, the "net_id" defines the retrained architecture with the pre-set search space (78-81 lines). Moreover, you can also use a pre-defined model as the retrained architecture, as in config/retrain/ViTAS_1.3G_retrain.yaml (80-83 lines); with this setting, you can directly train your defined architecture and do not need to use the "net_id" in your yaml.

Thanks! Can you tell me the cost of the search or the whole process? What type of GPU was used, how many GPUs and how many days did it take?

Jul 22 '21 12:07 December-boy

Thanks for your question. I leveraged 32 X V100 cards with 32G GPU RAM each to implement the search.

Jul 22 '21 12:07 xiusu

It takes about 2-3 days for searching an ViT architecture.

Jul 22 '21 12:07 xiusu

It takes about 2-3 days for searching an ViT architecture.

I‘ve trained the supernet, and the sampled results is strange. As shown in the followed figure, the test results is very low. Is that a normal situation?

Aug 05 '21 03:08 December-boy

Yes, during sampling, the accuracy of ViT architecture is relatively low in supernet.

Aug 05 '21 03:08 xiusu

ViTAS ViTAS copied to clipboard

The number of head

ViTAS
ViTAS copied to clipboard