ru-dalle-paddle icon indicating copy to clipboard operation
ru-dalle-paddle copied to clipboard

Other playable models-Text2Image

Open Wulx2050 opened this issue 2 years ago • 3 comments

playable models

  1. dalle-mini & craiyon https://github.com/borisdayma/dalle-mini

  2. CogView2 https://github.com/THUDM/CogView2

待添加


No pretrained models

  1. imagen https://github.com/lucidrains/imagen-pytorch

  2. 文心 ERNIE-ViLG https://wenxin.baidu.com/wenxin/modelbasedetail/ernie_vilg/

待添加

Wulx2050 avatar Jun 24 '22 13:06 Wulx2050

If we have enough time, we will try to migrate. However, I hope that Baidu official can release an open source model of text to image on paddlepaddle. I also know a popular model trained by Tsinghua University, although it is also a pytorch version. CogView2: https://github.com/THUDM/CogView2

HighCWu avatar Jun 24 '22 14:06 HighCWu

If we have enough time, we will try to migrate. However, I hope that Baidu official can release an open source model of text to image on paddlepaddle. I also know a popular model trained by Tsinghua University, although it is also a pytorch version. CogView2: https://github.com/THUDM/CogView2

我刚刚找了一下,文心 ERNIE-ViLG 文本生成图像的能力在开放领域公开数据集 MS-COCO 上进行了验证。评估指标使用 FID(该指标数值越低效果越好), 在 zero-shot 和 finetune 两种方式下,文心 ERNIE-ViLG 都取得了最佳成绩,效果远超 OpenAI 发布的 DALL-E 等模型。他们提供 ERNIE-ViLG API 体验调用的入口,也许你可以联系作者团队,找他们要预训练模型?

I just found it, and the ability of Wenxin ERNIE-ViLG to generate images from text is verified on the open domain public dataset MS-COCO. The evaluation index uses FID (the lower the value of the index, the better the effect). In both zero-shot and finetune methods, Wenxin ERNIE-ViLG has achieved the best results, and the effect is far superior to the models such as DALL-E released by OpenAI. They provide an entry to the ERNIE-ViLG API experience call, maybe you can contact the author team and ask them to pre-train the model?

文心 ERNIE-ViLG https://wenxin.baidu.com/wenxin/modelbasedetail/ernie_vilg/ paper: https://arxiv.org/pdf/2112.15283.pdf

Wulx2050 avatar Jun 25 '22 00:06 Wulx2050

Another project with code and models

  • ERNIE-SAT 类别文心·跨模态大模型 应用语音编辑、语音生成、语音克隆、带语音克隆的语音到语音翻译

ERNIE-SAT 采用语音-文本联合训练的方式在中文和英文数据集上进行预训练。使得模型学到了语音和文本的对齐关系,并且生成频谱的精度更高,合成声音的质量更高。

https://wenxin.baidu.com/wenxin/modelbasedetail/ernie_sat/

Wulx2050 avatar Jun 25 '22 00:06 Wulx2050