Yuzhou comments

Results 11 comments of


                                            Yuzhou

inference issue

Please do not use LLaVA-1.5 and use LLaVA-1.1, which can be found in official LLaVA github web.

testing the editing performance directly using stage 1's ckpt

Yes, even though we do not have enough space to discuss the importance of the first stage textual alignment training in the main paper, we conduct the ablation study that...

testing the editing performance directly using stage 1's ckpt

The role of stage-2 training is to transfer the ability of the MLLM to diffusion models. Since stage-1 only aligns the MLLM with CLIP, which means the MLLM can create...

LLaVA version

Sure, such as LLaVA-1.5, but you might need to check the slight difference between different LLaVA version, then other versions can be used.

there are lots of bugs in TrainStage1

Thanks for your interest in our work. There might be some small code typos when we push on github, while you could simply fix them for further usage.

Trainstage1 parent dir was not created

Thanks for your interest in our work. There might be some small code typos when we push on github, while you could simply fix them for further usage.

Qformer mm_projector issue

Thanks for your interest in our work. You might be right and maybe this is a small code error when we push on github, while it goes will after modification.

link for synthetic editing dataset

Thanks for your interest in our work. The link does not change ever, you could take a further look.

mask for each image in the Reason-Edit evaluation benchmark

Thanks for your interest in our work. The masks are not used in inference while they will be used to compute the evaluation metrics for the target editing area.

model checkpoint

Thanks for your interest in our work. While these checkpoints are cleaned before I leave Tencent. The training process of InstructPix2Pix and MagicBrush with our data is simple. We adopt...