SUR-adapter
SUR-adapter copied to clipboard
ACM MM'23 (oral), SUR-adapter for pre-trained diffusion models can acquire the powerful semantic understanding and reasoning capabilities from large language models to build a high-quality textual sem...
非常感谢作者的贡献,你们的研究给了我很大的启发,我想试试你们的模型。但是在中国无法使用谷歌网盘下载the dataset and vectors。请问有别的途径下载吗?
Can you please share the adapter checkpoint, `checkpoints/runwayml_fp16/test_llm13B_llml39_lr1e-05_llmw1e-05_promptw1e-05_adapterw0.1/adapter_checkpoint1000.pt` The one on Google drive (https://drive.google.com/drive/folders/1UyC9_AqTezmHXmj4dh0A-9RBKKx_JmJZ) is based on "SG161222/Realistic_Vision_V2.0"
Hi, nice work! Would you please the 15 testing prompts per each type: counting, color, action? Thanks
Colab?
您好,请问会计划放出colab测试吗。想看更多的case效果如何
Excuse me, will the model be open source? Looking forward to try the effect.
Lexica 1, Civitai 2, and Stable Diffusion Online have a large number of images. May I ask what criteria or keywords did you use to select collect 114,148 image-text pairs...
首先感谢你们的文章!看了很有启发性。 我有一点困惑,对于文中提到的取llama模型的语义特征,并且你们提到了下面这两行: for layer in self.layers: h = layer(h, start_pos, freqs_cis, mask) 我使用了你们的例子输入”a colorful animal with big eyes on a blue background,“但当我打印第40层layer的最终结果shape时,显示h的shape是【1, 12, 5120】,显然每个word都拥有一个[5120]长度的token,但你们的sur_data_small里面却是一个【5120】的token,这是怎么回事呢?我应该取LLM哪个位置的语义特征呢? 期待你们的回复
使用的推理代码如下,权重使用https://drive.google.com/drive/folders/1UyC9_AqTezmHXmj4dh0A-9RBKKx_JmJZ import os os.environ['CUDA_VISIBLE_DEVICES']='0' from SUR_adapter_pipeline import SURStableDiffusionPipeline import torch from SUR_adapter import Adapter adapter_path = "adapter_checkpoint.pt" adapter=Adapter().to("cuda") adapter.load_state_dict(torch.load(adapter_path)) adapter.adapter_weight = 0.1 model_path = "runwayml/stable-diffusion-v1-5" pipe = SURStableDiffusionPipeline.from_pretrained(model_path, adapter=adapter) pipe.to("cuda")...