LLaVA-HR the training parameters of your single branch convnext encoder

Hello, your job is so great.

But I would like to ask, is it convenient to disclose the training parameters of your single branch convnext encoder? I am not very able to understand the following part of the code.

def feature_select(self, image_forward_outs):
        if self.select_layer>100:
            image_features = image_forward_outs[-4:]
        else:
            image_features = image_forward_outs[-1]
        return image_features

Mar 08 '24 12:03 yuecao0119

These codes of image_features = image_forward_outs[-4:] are not actually used. We directly select the last layer of ConvNeXT to extract visual features. We will revise our codes soon.

Mar 08 '24 12:03 luogen1996

Thank you for your answer. How should the single-branch convnext in your paper be trained? Because I tried to use your code to train single-branch convnext, the loss effect in the pretrain stage was not very good.

Mar 08 '24 14:03 yuecao0119

Your loss looks actually good. Single-branch LLaVA-HR performs worse, see our paper.

Mar 08 '24 14:03 luogen1996