text-guided-image-colorization
text-guided-image-colorization copied to clipboard
This repository provides an interactive image colorization tool that leverages Stable Diffusion (SDXL) and BLIP for user-controlled color generation. With a retrained model using the ControlNet approa...
Text-Guided-Image-Colorization
This project utilizes the power of Stable Diffusion (SDXL/SDXL-Light) and the BLIP (Bootstrapping Language-Image Pre-training) captioning model to provide an interactive image colorization experience. Users can influence the generated colors of objects within images, making the colorization process more personalized and creative.

Table of Contents
- Features
- Installation
- Quick Start
- Dataset Usage
- Training
- Evaluation
- Results
- License
News
- (2024/11/23) The project is now available on Hugging Face Spaces 🎉 Big thanks to @fffiloni!
Features
- Interactive Colorization: Users can specify desired colors for different objects in the image.
- ControlNet Approach: Enhanced colorization capabilities through retraining with ControlNet, allowing SDXL to better adapt to the image colorization task.
- High-Quality Outputs: Leverage the latest advancements in diffusion models to generate vibrant and realistic colorizations.
Installation
To set up the project locally, follow these steps:
-
Clone the Repository:
git clone https://github.com/nick8592/text-guided-image-colorization.git cd text-guided-image-colorization -
Install Dependencies: Make sure you have Python 3.7 or higher installed. Then, install the required packages:
pip install -r requirements.txtInstall
torchandtorchvisionmatching your CUDA version:pip install torch torchvision --index-url https://download.pytorch.org/whl/cuXXXReplace
XXXwith your CUDA version (e.g.,118for CUDA 11.8). For more info, see PyTorch Get Started. -
Download Pre-trained Models:
Models Hugging Face SDXL-Lightning Caption link SDXL-Lightning Custom Caption (Recommand) link text-guided-image-colorization/sdxl_light_caption_output └── checkpoint-30000 ├── controlnet │ ├── diffusion_pytorch_model.safetensors │ └── config.json ├── optimizer.bin ├── random_states_0.pkl ├── scaler.pt └── scheduler.bin
Quick Start
- Run the
gradio_ui.pyscript:
python gradio_ui.py
-
Open the provided URL in your web browser to access the Gradio-based user interface.
-
Upload an image and use the interface to control the colors of specific objects in the image. But still the model can generate images without a specific prompt.
-
The model will generate a colorized version of the image based on your input (or automatic). See the demo video.

Dataset Usage
You can find more details about the dataset usage in the Dataset-for-Image-Colorization.
Training
For training, you can use one of the following scripts:
train_controlnet.sh: Trains a model using Stable Diffusion v2train_controlnet_sdxl.sh: Trains a model using SDXLtrain_controlnet_sdxl_light.sh: Trains a model using SDXL-Lightning
Although the training code for SDXL is provided, due to a lack of GPU resources, I wasn't able to train the model by myself. Therefore, there might be some errors when you try to train the model.
Evaluation
For evaluation, you can use one of the following scripts:
eval_controlnet.sh: Evaluates the model using Stable Diffusion v2 for a folder of images.eval_controlnet_sdxl_light.sh: Evaluates the model using SDXL-Lightning for a folder of images.eval_controlnet_sdxl_light_single.sh: Evaluates the model using SDXL-Lightning for a single image.
Results
Prompt-Guided
| Caption | Condition 1 | Condition 2 | Condition 3 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| a photography of a woman in a soccer uniform kicking a soccer ball | + "green shirt" | + "purple shirt" | + "red shirt" |
![]() |
![]() |
![]() |
![]() |
| a photography of a photo of a truck | + "bright red car" | + "dark blue car" | + "black car" |
![]() |
![]() |
![]() |
![]() |
| a photography of a cat wearing a hat on his head | + "orange hat" | + "pink hat" | + "yellow hat" |
Prompt-Free
Ground truth images are provided solely for reference purpose in the image colorization task.
| Grayscale Image | Colorized Result | Ground Truth |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Read More
Here are some related articles you might find interesting:
- Image Colorization: Bringing Black and White to Life
- Understanding RGB, YCbCr, and Lab Color Spaces
- Comparison Between CLIP and BLIP Models
- A Step-by-Step Guide to Interactive Machine Learning with Gradio
License
This project is licensed under the MIT License. See the LICENSE file for more details.


























