Qingpei Guo comments

Results 6 comments of


                                            Qingpei Guo

M2 Encoder why not mentioned input resolution anywhere?a

The input resolution is 224*224, for more details please refer to our paper in Table 10: https://arxiv.org/pdf/2401.15896v2.pdf

Zero-shot accuracy on ImageNet (in the CLIP setting) is lower than the number reported in the paper

> How do I load the 10B model? Is it open-sourced yet? All models are released here: https://www.modelscope.cn/organization/M_Square. Meanwhile, we provide a simple example at https://github.com/alipay/Ant-Multi-Modal-Framework/blob/main/prj/M2_Encoder/run.py. By modifying cfg['model_config'], you...

Inquiry about Model Size and Plans for Open Sourcing Larger Models

1. Indeed, the file 'm2_encoder_0.2B.ckpt' corresponds to the model with 0.4 billion parameters referenced in our publication. Apologies for the discrepancy in naming 2. Certainly, we are committed to making...

Inquiry about Model Size and Plans for Open Sourcing Larger Models

fixe the issue in this commit: https://github.com/alipay/Ant-Multi-Modal-Framework/commit/b38e199ecca0989c84fcba49f44823a1024a14a7, and naming in modelscope: https://www.modelscope.cn/models/M2Cognition/M2-Encoder/files

Inquiry about Model Size and Plans for Open Sourcing Larger Models

We have released 1B and 10B modes in this PR: #14

m2 encoder的文本编码器和图像编码器分别是什么？

我们使用了glm的tokenizer, 整体模型结构设计参考了beit3： https://arxiv.org/abs/2208.10442，预训练任务设计参考了sycoca: https://arxiv.org/abs/2401.02137, 因模型结构有修改，文本编码器和图像编码器没有预训练权重可加载，预训练过程中随机初始化，train from scratch进行参数学习。