CLIP-featurevis icon indicating copy to clipboard operation
CLIP-featurevis copied to clipboard

try to visualize the features of T5: Text-To-Text Transfer Transformer?

Open Wulx2050 opened this issue 2 years ago • 0 comments

Imagen' key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model.

Theyalso find that while T5-XXL and CLIP text encoders perform similarly on simple benchmarks such as MS-COCO, human evaluators prefer T5-XXL encoders over CLIP text encoders in both image-text alignment and image fidelity on DrawBench, a set of challenging and compositional prompts.

So try to visualize the features of T5?

T5: https://github.com/google-research/text-to-text-transfer-transformer

Wulx2050 avatar Jun 23 '22 08:06 Wulx2050