diffusers AnyText: Multilingual Visual Text Generation And Editing

Model/Pipeline/Scheduler description

AnyText comprises a diffusion pipeline with two primary elements: an auxiliary latent module and a text embedding module. The former uses inputs like text glyph, position, and masked image to generate latent features for text generation or editing. The latter employs an OCR model for encoding stroke data as embeddings, which blend with image caption embeddings from the tokenizer to generate texts that seamlessly integrate with the background. We employed text-control diffusion loss and text perceptual loss for training to further enhance writing accuracy.

Open source status

[X] The model implementation is available.
[X] The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

Repository: https://github.com/tyxsspa/AnyText

Paper: https://arxiv.org/abs/2311.03054

Weights and inference code: https://modelscope.cn/models/damo/cv_anytext_text_generation_editing/summary

Dec 31 '23 06:12 sayakpaul

I'd like to work on that

Dec 31 '23 19:12 coding-famer

Yes, sure! Feel free to let us know in case of any help.

For starters, I think it might be better to add this to research_projects similar to ControlNetXS.

We might not be able to add to community because AnyText has modelling components.

Does this make sense? If we see enough usage, we can include it in the core.

Jan 01 '24 02:01 sayakpaul

Hi @coding-famer. Have you been able to make progress on this? I'd very much like to be able to use this with diffusers, and would like to help where I can. From the pipeline perspective, I understand most of the code and have made some significant progress. From the modelling perspective, I'm not too sure about what new additions need to be made as I'm still navigating the codebase.

This is a link to the converted AnyText model on huggingface, which might be of help. It took me a very long time (~18 hours) to download from the modelscope hub servers, which I assume are located in China. I'm hoping the conversion to diffusers format was correct. I'm still looking into it, and do not have a full idea, but it seems like there will be different weights used in the clip-encoder based on embedding type here: (but this ocr and vit only seem to be useful for text-editing, which could probably be done sometime in the future; for now, replicating the text-generation part would be great)

https://github.com/tyxsspa/AnyText/blob/cd8924720896462ad61e2adaf086b669340207e0/cldm/embedding_manager.py#L75

Jan 13 '24 11:01 a-r-r-o-w

Hi @coding-famer. Have you been able to make progress on this? I'd very much like to be able to use this with diffusers, and would like to help where I can. From the pipeline perspective, I understand most of the code and have made some significant progress. From the modelling perspective, I'm not too sure about what new additions need to be made as I'm still navigating the codebase.

This is a link to the converted AnyText model on huggingface, which might be of help. It took me a very long time (~18 hours) to download from the modelscope hub servers, which I assume are located in China. I'm hoping the conversion to diffusers format was correct. I'm still looking into it, and do not have a full idea, but it seems like there will be different weights used in the clip-encoder based on embedding type here: (but this ocr and vit only seem to be useful for text-editing, which could probably be done sometime in the future; for now, replicating the text-generation part would be great)

https://github.com/tyxsspa/AnyText/blob/cd8924720896462ad61e2adaf086b669340207e0/cldm/embedding_manager.py#L75

Hi, I'm still working on this. Happy to do it together.

Jan 15 '24 19:01 coding-famer

Hi, I'm still working on this. Happy to do it together.

Hey, sorry for the late response. I got caught up with other PRs and looking into other interesting work. Would Discord be okay for communication if you're still progressing on this?

Feb 04 '24 04:02 a-r-r-o-w

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Feb 28 '24 15:02 github-actions[bot]

Contributions are still welcome.

Feb 28 '24 15:02 sayakpaul

@sayakpaul can i work on this ?

Mar 08 '24 09:03 tuanh123789

Sure, we can start with a community pipeline :)

Mar 08 '24 09:03 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Apr 01 '24 15:04 github-actions[bot]

not stale

Apr 01 '24 17:04 bghira

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Apr 26 '24 15:04 github-actions[bot]

not stale

May 01 '24 11:05 bghira

Can I work on this community pipeline?

Edit 1: I have been busy for several weeks lately because of several personal issues. From now on, I am completely into this. Sorry for holding this pipeline so far.

Edit 2: I largely understood the pipeline. Now, I am trying to convert the checkpoint into diffusers' format. It has a ControlNet model and several other special components.

Jun 06 '24 19:06 tolgacangoz

Yes, you can. Thank you :)

Jun 07 '24 01:06 sayakpaul

diffusers diffusers copied to clipboard

AnyText: Multilingual Visual Text Generation And Editing

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

diffusers
diffusers copied to clipboard