sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

Regarding the Multi-Level Caption

Open sdbds opened this issue 4 months ago • 3 comments

image I'm in the process of building a VLM caption program, perhaps using multi-level caption similar to PG3.

As far as I know most of the DiT model training since SD3 has used a multi-level caption + random matching strategy.

Considering that natural language is becoming more and more popular, maybe we can just add shuffle to read different levels of captions.

Now there are several possible captions:

1, still use separate txt files, but with multiple lines representing different levels of captions(similar to sampler prompt)

2, load multiple text files with different extensions

3, use a dictionary file like json to represent different captions

sdbds avatar Sep 26 '24 05:09 sdbds