Fooocus icon indicating copy to clipboard operation
Fooocus copied to clipboard

[Feature Request]: Adding " Clip Interrogator " image to prompts in fooocus

Open badraymen opened this issue 1 year ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What would your feature do?

after my experience with the fooocus "describe" tool, I found that there are missing sentences and missing words in the creation of the prompts and the sentences are too short and they are not really targeted, however I found an alternative, I looked on the internet for websites that generate prompts from an image and that was my problem because honestly I use the all prompt image option too much to create funds for my photo and I found "clip interrogation" which is an extension intended for SD XL and I tested it on "collab" and it gave magnificent results the words are well targeted with the name of the photographers and/or even the style name, it manages to recognize the brands they sometimes manage to write correctly the name of the brands on the products that I use for handling, I found that it is really practical I used it on fooocus and This gave truly incredible results; there is a great resemblance between the image that I would like to generate and the original image. so it will be really kind of you to add this functionality to focus in the form of a tab to create prompts and switch them directly into the text field for generation Link :

https://github.com/pharmapsychotic/clip-interrogator

best regards

Proposed workflow

  1. Go to "Input Image"
  2. Go to "describe"
  3. Choose Model expl: Vit-L/Openai
  4. choose fast or best
  5. put your image to describe it
  6. press generate prompt

Additional information

No response

badraymen avatar May 26 '24 23:05 badraymen

@badraymen fyi I'm on it and currently testing various image captioning models in a separate project: https://github.com/mashb1t/describeiments

The intermediate result is that BLIP (1) (+ BERT) is the one with the best integration into Fooocus and lowest resource allocation, not sure if worth the switch + effort.

modified code of https://huggingface.co/spaces/pharmapsychotic/CLIP-Interrogator/blob/main/app.py can be found in interrogator.py.txt

image One can also really overshoot in terms of VRAM with the combination of ViT-H and BLIP 2.

mashb1t avatar May 31 '24 22:05 mashb1t

Thank you so much @mashb1t For your interest and your involvement, but can I have a speech understandable for a person who really knows nothing in the language of coding and paython, an explanation to simplify I would be really kind of you. So are you going to integrate clip interrogator, or are you going to develop a new function in fooocus for the next version?

badraymen avatar May 31 '24 23:05 badraymen

@badraymen sure: no clip-interrogator until it has been fully evaluated and benchmarked. (also it's based on transformers, which Fooocus doesn't use)

mashb1t avatar May 31 '24 23:05 mashb1t

You are really kind dear sir, thank you again for this clarification, good luck in what you do

badraymen avatar Jun 01 '24 00:06 badraymen