diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

AnyText: Multilingual Visual Text Generation And Editing

Open sayakpaul opened this issue 1 year ago • 13 comments

Model/Pipeline/Scheduler description

From the repository:

AnyText comprises a diffusion pipeline with two primary elements: an auxiliary latent module and a text embedding module. The former uses inputs like text glyph, position, and masked image to generate latent features for text generation or editing. The latter employs an OCR model for encoding stroke data as embeddings, which blend with image caption embeddings from the tokenizer to generate texts that seamlessly integrate with the background. We employed text-control diffusion loss and text perceptual loss for training to further enhance writing accuracy.

image

Open source status

  • [X] The model implementation is available.
  • [X] The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

Repository: https://github.com/tyxsspa/AnyText

Paper: https://arxiv.org/abs/2311.03054

Weights and inference code: https://modelscope.cn/models/damo/cv_anytext_text_generation_editing/summary

sayakpaul avatar Dec 31 '23 06:12 sayakpaul

I'd like to work on that

coding-famer avatar Dec 31 '23 19:12 coding-famer

Yes, sure! Feel free to let us know in case of any help.

For starters, I think it might be better to add this to research_projects similar to ControlNetXS.

We might not be able to add to community because AnyText has modelling components.

Does this make sense? If we see enough usage, we can include it in the core.

sayakpaul avatar Jan 01 '24 02:01 sayakpaul

Hi @coding-famer. Have you been able to make progress on this? I'd very much like to be able to use this with diffusers, and would like to help where I can. From the pipeline perspective, I understand most of the code and have made some significant progress. From the modelling perspective, I'm not too sure about what new additions need to be made as I'm still navigating the codebase.

This is a link to the converted AnyText model on huggingface, which might be of help. It took me a very long time (~18 hours) to download from the modelscope hub servers, which I assume are located in China. I'm hoping the conversion to diffusers format was correct. I'm still looking into it, and do not have a full idea, but it seems like there will be different weights used in the clip-encoder based on embedding type here: (but this ocr and vit only seem to be useful for text-editing, which could probably be done sometime in the future; for now, replicating the text-generation part would be great)

https://github.com/tyxsspa/AnyText/blob/cd8924720896462ad61e2adaf086b669340207e0/cldm/embedding_manager.py#L75

a-r-r-o-w avatar Jan 13 '24 11:01 a-r-r-o-w

Hi @coding-famer. Have you been able to make progress on this? I'd very much like to be able to use this with diffusers, and would like to help where I can. From the pipeline perspective, I understand most of the code and have made some significant progress. From the modelling perspective, I'm not too sure about what new additions need to be made as I'm still navigating the codebase.

This is a link to the converted AnyText model on huggingface, which might be of help. It took me a very long time (~18 hours) to download from the modelscope hub servers, which I assume are located in China. I'm hoping the conversion to diffusers format was correct. I'm still looking into it, and do not have a full idea, but it seems like there will be different weights used in the clip-encoder based on embedding type here: (but this ocr and vit only seem to be useful for text-editing, which could probably be done sometime in the future; for now, replicating the text-generation part would be great)

https://github.com/tyxsspa/AnyText/blob/cd8924720896462ad61e2adaf086b669340207e0/cldm/embedding_manager.py#L75

Hi, I'm still working on this. Happy to do it together.

coding-famer avatar Jan 15 '24 19:01 coding-famer

Hi, I'm still working on this. Happy to do it together.

Hey, sorry for the late response. I got caught up with other PRs and looking into other interesting work. Would Discord be okay for communication if you're still progressing on this?

a-r-r-o-w avatar Feb 04 '24 04:02 a-r-r-o-w

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Feb 28 '24 15:02 github-actions[bot]

Contributions are still welcome.

sayakpaul avatar Feb 28 '24 15:02 sayakpaul

@sayakpaul can i work on this ?

tuanh123789 avatar Mar 08 '24 09:03 tuanh123789

Sure, we can start with a community pipeline :)

sayakpaul avatar Mar 08 '24 09:03 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 01 '24 15:04 github-actions[bot]

not stale

bghira avatar Apr 01 '24 17:04 bghira

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 26 '24 15:04 github-actions[bot]

not stale

bghira avatar May 01 '24 11:05 bghira

Can I work on this community pipeline?

Edit 1: I have been busy for several weeks lately because of several personal issues. From now on, I am completely into this. Sorry for holding this pipeline so far.

Edit 2: I largely understood the pipeline. Now, I am trying to convert the checkpoint into diffusers' format. It has a ControlNet model and several other special components.

tolgacangoz avatar Jun 06 '24 19:06 tolgacangoz

Yes, you can. Thank you :)

sayakpaul avatar Jun 07 '24 01:06 sayakpaul