Alfred Increment issues

Results 11 issues of


                                            Alfred Increment

Self Attention or Cross Attention on Mid block of U-Net?

I cannot understand the attention on middle block of U-Net. Which is correct: self attention or cross attention? According to the comment, the middle block of U-Net "always uses a...

question

Support Stable Diffusion 2

Your work is great! I have generated Anime easily by my model, Cool Japan Diffusion 1.x. ![test_](https://user-images.githubusercontent.com/3625196/215487593-de25bd0b-5982-4338-8446-3f604f71c53e.gif) ![test](https://user-images.githubusercontent.com/3625196/215487536-f13a1956-ef30-4ebe-b5ad-4710cef25311.gif) I would like to generate Anime by Stable Diffusion 2.1 because I...

SD 2.1 support, openpose

I would like to use the controlnet on SD 2.1 because I finetune SD2.1 . My models, Picasso Diffusion and Cool Japan Diffusion, and Waifu Diffusion 1.5 are made of...

Change Ukiyo-e titles

Please change the titles of Ukiyo-e into English instead of numbers. I attached the file including the titles of Ukiyo-e from ukiyo-e.org . I am Japanese. So, I want to...

enhancement

How to Video Captioning

Thank to your advice, I could finetune your model on Zudamon Anime dataset. (I can use authorized Zudamon Anime dataset.) https://github.com/PKU-YuanGroup/Open-Sora-Plan/assets/3625196/a76c1ebc-96ab-4b65-9f9b-7469b158d432 But, I could do video captioning few videos manually...

Support Video to Video generation

I would like to v2v by your model. I think we need to add two points on opensora/sample/pipeline_videogen.py. 1. Create the encode_videos function like the follow: ```python def encode_videos(self, videos):...

Difference between the tokenizer of sd 2.0 and ViT-H.

In advance, your work, CLIP Interrogator, is great! However, It seems that the tokenizer of sd 2.0 is different from one of ViT-H. For example, here is the test code....

Question about the differences in methodologies

Hello, I find your repository very insightful and valuable. While exploring, I noticed some similarities between your methodology and the one found in https://github.com/flaribbit/imgfind. If possible, could you elucidate the...

Support GaLore Optimazer

The optimizer is memory efficient. We can pretrain mistral-7B with 24GB. https://github.com/jiaweizzhao/GaLore ![image](https://github.com/kohya-ss/sd-scripts/assets/3625196/d3bb5964-6f60-42f2-8a5c-4dc22959700a)

How to use this VAE on LCM

Thanks to open your work. Btw, I ran the code: ```python from diffusers import DiffusionPipeline,StableDiffusionPipeline import torch from consistencydecoder import ConsistencyDecoder, save_image, load_image pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", custom_pipeline="latent_consistency_txt2img", custom_revision="main", revision="fb9c5d") decoder_consistency...