understanding-ai
understanding-ai copied to clipboard
https://github.com/showlab/Image2Paragraph
Summary
- uses blip/blip2 to generate a simple caption
- uses grit/detectron2 to generate a dense caption
- uses segment anything to generate a region_semantic information
- unify all above and prompt to GPT
- canny the input image (which is the bullshit part) and generate the new image using StableDiffusionControlNetPipeline
Conclusion
- The output prompt from this project cannot generate a similar image to the input without the canny image of the input.