Qingpei Guo
Qingpei Guo
The input resolution is 224*224, for more details please refer to our paper in Table 10: https://arxiv.org/pdf/2401.15896v2.pdf
> How do I load the 10B model? Is it open-sourced yet? All models are released here: https://www.modelscope.cn/organization/M_Square. Meanwhile, we provide a simple example at https://github.com/alipay/Ant-Multi-Modal-Framework/blob/main/prj/M2_Encoder/run.py. By modifying cfg['model_config'], you...
1. Indeed, the file 'm2_encoder_0.2B.ckpt' corresponds to the model with 0.4 billion parameters referenced in our publication. Apologies for the discrepancy in naming 2. Certainly, we are committed to making...
fixe the issue in this commit: https://github.com/alipay/Ant-Multi-Modal-Framework/commit/b38e199ecca0989c84fcba49f44823a1024a14a7, and naming in modelscope: https://www.modelscope.cn/models/M2Cognition/M2-Encoder/files
We have released 1B and 10B modes in this PR: #14
我们使用了glm的tokenizer, 整体模型结构设计参考了beit3: https://arxiv.org/abs/2208.10442, 预训练任务设计参考了sycoca: https://arxiv.org/abs/2401.02137, 因模型结构有修改,文本编码器和图像编码器没有预训练权重可加载,预训练过程中随机初始化,train from scratch进行参数学习。