OneTrainer
OneTrainer copied to clipboard
Clearer handling of cropping and resolutions:
Describe your use-case.
Right now the quick start guide suggests that I shouldn't really bother about resizing my dataset images, but it will be handled by Onetrainer if I activate resolution bucketing, but I noticed that when selecting multiple training resolution, if I set batch size to 1 it uses all samples, but if I set it to 2 the number of steps is less than half, so some image is not used anymore.
What's not really clear is what happens, let's make an example:
- I set training resolutions to 512, 640, 768, 960.
- I have a 639*641 image, is it always cropped to 512x640, or sometimes to 512x512?
- I have a 256x320 image, is it upscaled to 512x640 or can sometimes end up at 768x960?
I also noticed that even with crop jitter enabled the preview is static, if I have a 1024*512 image do I get crops of image[0:960,0:512] and [64:1024, 0:512] or the crops are always centered? Will it sometimes be cropped to resolutions different from 960x512?
What would you like to see as a solution?
I have 5 proposals to improve both clarity and training:
- Use all images option: when batch size > 1, always try to have batch_size images for every resolution even if it involves using crops with less coverage of the original images
- correclty show crop jitter's effect in the preview (assuming righ now it only shows a centered square crop and not what's actually used)
- vary scaling option: if possible, also uses samples downscaled to lower resolutions, not only maximum one
- when using samples below a set resolution (even if upscaled), optionally add a set tag (for example "low resolution, low quality") to the prompt, same when above certain resolution (for example "high resolution")
- allow to set both horizontal and vertical resolution, so that i can set something like "384, 512x512, 768" and have as a set of allowed resolutions "384x384, 384x768, 512x512, 768x768, 768x384"
Have you considered alternatives? List them here.
right now I can probably have multiple copies of each image with different resolutions/aspect ratios/cropping, but would require a lot of them to truly cover each possible crop of each image