the training parameters of your single branch convnext encoder
Hello, your job is so great.
But I would like to ask, is it convenient to disclose the training parameters of your single branch convnext encoder? I am not very able to understand the following part of the code.
def feature_select(self, image_forward_outs):
if self.select_layer>100:
image_features = image_forward_outs[-4:]
else:
image_features = image_forward_outs[-1]
return image_features
These codes of image_features = image_forward_outs[-4:] are not actually used. We directly select the last layer of ConvNeXT to extract visual features. We will revise our codes soon.
Thank you for your answer.
How should the single-branch convnext in your paper be trained? Because I tried to use your code to train single-branch convnext, the loss effect in the pretrain stage was not very good.
Your loss looks actually good. Single-branch LLaVA-HR performs worse, see our paper.