OneTrainer icon indicating copy to clipboard operation
OneTrainer copied to clipboard

[Feat]: Slider Support

Open Praecordi opened this issue 10 months ago • 15 comments

Describe your use-case.

Sliders are a really great way to edit images. They can allow users to control image generation according to their vision, or inpaint existing images to match their vision. While slider training has been implemented in other repositories, for OneTrainer to be a comprehensive solution for training, I believe support for sliders is essential.

The implementation of sliders are slightly different than standard LoRAs. The popular repository for training sliders is currently LECO which is based on this paper by Rohit Gandikota. A follow up paper presents a better method that offers better control and composability.

What would you like to see as a solution?

I believe the idea for sliders can easily be integrated into the idea of concepts and configs in OneTrainer. Namely, we can set each concept as a base concept and associate another concepts with a positive direction. Optionally, another concept can be associated with the negative direction for finer control. The goal would be a LoRA which can control certain aspects by increasing or decreasing their weight. Allowing more concepts can allow for finer triggers in a single slider, such as one slider for say the size hands and feets, but if only "large hands" is specified in the prompt, then only the size of the hand is increased.

I have not read the papers nor examined the repositories in enough detail to suggest the behind-the-hood implementation, but a cursory glance does not indicate a process that is too different from training LoRAs.

Have you considered alternatives? List them here.

One alternative mentioned in the following thread involves subtracting one LoRA from another. Unfortunately, I haven't gotten the opportunity to test the effectiveness of this method, but, regardless of its effectiveness, this method requires training two LoRAs which may be expensive (in terms of time and money) to train. Implementing LECOs and Sliders is much more efficient in training these types of LoRAs.

Praecordi avatar Feb 18 '25 01:02 Praecordi

I am working on this. I have to disagree with you though that it can be easily integrated :)

Among other things,

  • it cannot be multiple concepts for text sliders, because the positive, neutral (and optionally negative) predictions must be available in the same training step. Multiple concepts can be multiple training steps. So we need multiple prompts in the same concept. MGDS must be extended.
  • The predictions are done on the prior model, not on the trained model. The original slider code does this by unhooking the LoRA from the transformer. While this can be easily hacked (as done here: https://github.com/Nerogar/OneTrainer/pull/505), it is more difficult to implement generically for all model types
  • OT uses image samples to train - your training data set. While this still works somewhat even for bias training with multiple prompts, what the slider paper does is to generate samples: a noisy sample is generated using the student model up to a certain timestep, and then the next prediction, that is the training step, is done on this generated sample using the teacher model and the differential prompts. This is necessary to fully learn the difference between positive and neutral.
  • There is more...

dxqb avatar Feb 18 '25 06:02 dxqb

Some experimental samples below.

They were generated on OneTrainer, by using "a man with his dog" as the target prompt, but "dog" bias-trained towards "dragon". The student model was Flux, the teacher model was SDXL.

Image Image Image Image

dxqb avatar Feb 18 '25 07:02 dxqb

I am working on this. I have to disagree with you though that it can be easily integrated :)

Among other things,

  • it cannot be multiple concepts for text sliders, because the positive, neutral (and optionally negative) predictions must be available in the same training step. Multiple concepts can be multiple training steps. So we need multiple prompts in the same concept. MGDS must be extended.
  • The predictions are done on the prior model, not on the trained model. The original slider code does this by unhooking the LoRA from the transformer. While this can be easily hacked (as done here: Prior model preservation #505), it is more difficult to implement generically for all model types
  • OT uses image samples to train - your training data set. While this still works somewhat even for bias training with multiple prompts, what the slider paper does is to generate samples: a noisy sample is generated using the student model up to a certain timestep, and then the next prediction, that is the training step, is done on this generated sample using the teacher model and the differential prompts. This is necessary to fully learn the difference between positive and neutral.
  • There is more...

Perhaps I spoke too zealously and oversimplified.

  • I see what you mean by multiple concepts. It was just an idea that I'm not too hung up on.
  • What is wrong with Nerogar's suggestion in this comment? A generic function in the model class that toggles all LoRAs?
  • Also, I'm not sure where you're getting the student-teacher model for learning? The paper I'm refering to --- "Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models" --- uses only a single model with and without LoRAs? Maybe I missed something.

Also, do you have a link to this branch?

Praecordi avatar Feb 20 '25 18:02 Praecordi

Also, I'm not sure where you're getting the student-teacher model for learning? The paper I'm refering to --- "Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models" --- uses only a single model with and without LoRAs? Maybe I missed something.

Yes, in the case of the slider paper the model without LoRA is the teacher model, and the model with LoRA the student model. I have also experimented with other teacher models, but my point above was not the type of model used as teacher, but that the training samples have to be generated from the student model for text slider training according to the paper, not just read from an image file as OneTrainer does otherwise - which is a another large change that has to implemented to support slider training.

Also, do you have a link to this branch?

The experimental code is not public

dxqb avatar Feb 20 '25 19:02 dxqb

Here is experimental image slider code: https://github.com/Nerogar/OneTrainer/compare/master...dxqbYD:OneTrainer:image_slider

Can be used, but your dataset has to prepared in a special way. Only BS1, only Flux (unless you copy the code to other models)

dxqb avatar Mar 08 '25 13:03 dxqb

Here is experimental image slider code: master...dxqbYD:OneTrainer:image_slider

Can be used, but your dataset has to prepared in a special way. Only BS1, only Flux (unless you copy the code to other models)

What if I also use masks, does it work? Should I put the same mask in both the "positive" and "negative" folders?

deniaud avatar Mar 09 '25 15:03 deniaud

What if I also use masks, does it work? Should I put the same mask in both the "positive" and "negative" folders?

should work. Can be the same mask, but doesn't have to be.

dxqb avatar Mar 09 '25 15:03 dxqb

What if I also use masks, does it work? Should I put the same mask in both the "positive" and "negative" folders?

should work. Can be the same mask, but doesn't have to be.

What changes do I need to make to the part with the UI so that I can use this without the CLI?

At the moment, I get the following error: dataset tool does not see the contents of the positive and negative folders:

Traceback (most recent call last): File "/home/deniaud/onetrainer/conda_env/lib/python3.10/tkinter/init.py", line 1921, in call return self.func(*args) File "/home/deniaud/onetrainer/modules/ui/ConceptTab.py", line 102, in lambda event: open_command(self.i, (self.ui_state, self.image_ui_state, self.text_ui_state)) File "/home/deniaud/onetrainer/modules/ui/ConfigList.py", line 217, in __open_element_window window = self.open_element_window(i, ui_state) File "/home/deniaud/onetrainer/modules/ui/ConceptTab.py", line 38, in open_element_window return ConceptWindow(self.master, self.current_config[i], ui_state[0], ui_state[1], ui_state[2]) File "/home/deniaud/onetrainer/modules/ui/ConceptWindow.py", line 85, in init self.image_augmentation_tab = self.__image_augmentation_tab(tabview.add("image augmentation")) File "/home/deniaud/onetrainer/modules/ui/ConceptWindow.py", line 240, in __image_augmentation_tab image_preview, filename_preview, caption_preview = self.__get_preview_image() File "/home/deniaud/onetrainer/modules/ui/ConceptWindow.py", line 461, in __get_preview_image image_tensor = image_tensor * mask_tensor RuntimeError: The size of tensor a (1017) must match the size of tensor b (1024) at non-singleton dimension 2

deniaud avatar Mar 10 '25 16:03 deniaud

RuntimeError: The size of tensor a (1017) must match the size of tensor b (1024) at non-singleton dimension 2

that seems like an unrelated error to slider training. both your image and your mask must have the same dimensions, but one seems to only have 1017 pixels instead of 1024.

dxqb avatar Mar 10 '25 16:03 dxqb

Here are some training results using the image slider code linked above. On the left, Flux without LoRA. On the right, with a Lora that is trained on realistic natural portraits with Flux-style portraits as negatives.

Image

Image

Image

dxqb avatar Mar 10 '25 16:03 dxqb

Here is experimental image slider code: master...dxqbYD:OneTrainer:image_slider

Can be used, but your dataset has to prepared in a special way. Only BS1, only Flux (unless you copy the code to other models)

Experimental text slider code is now here: https://github.com/dxqbYD/OneTrainer/tree/text_slider Only BS1, only Flux

dxqb avatar Mar 16 '25 09:03 dxqb

Image

dxqb avatar Mar 16 '25 09:03 dxqb

Thanks for working on this! I'm trying to train some sliders for fun. I'm wondering if there will be SDXL based model support? I'm having issue setting up other available scripts, but I have OneTrainer setup correctly - wondering if I should wait for this feature merge or keep debugging the setup for other scripts?

jancodes avatar Mar 26 '25 15:03 jancodes

@dxqbYD SDXL support for text sliders is possible?

deniaud avatar Apr 26 '25 13:04 deniaud

@dxqbYD SDXL support for text sliders is possible?

I've linked experimental code for text sliders above. They can be used. Full integration into OneTrainer requires a lot of MGDS work first.

dxqb avatar Apr 26 '25 13:04 dxqb