[Feat]: Custom Max Token Length for SD/SDXL
Describe your use-case.
Increase the max number of tokens supported by SD1.5 and SDXL from 75.
What would you like to see as a solution?
Increased token length support even if its a little hacky.
Have you considered alternatives? List them here.
No response
This would be great. There was a whitepaper on DALL E 3 where they used descriptive synthetic captions to improve model. Default 75 tokens is quite limiting in this regard.
I have created code in this fork (https://github.com/celll1/OneTrainer/tree/dev) that supports token lengths of up to (75 tokens x) 3 chunks for the Text Encoder. It has been confirmed to work with SDXL LoRA.
Please note that the handling of BOS/EOS tokens differs in the implementation of sd-scripts. I am not confident about whether or not an attention mask should be applied.
I have created code in this fork (https://github.com/celll1/OneTrainer/tree/dev) that supports token lengths of up to (75 tokens x) 3 chunks for the Text Encoder. It has been confirmed to work with SDXL LoRA.
Please note that the handling of BOS/EOS tokens differs in the implementation of sd-scripts. I am not confident about whether or not an attention mask should be applied.
Hop on the discord and ask Nerogar, once youve discussed it with him and hes reviewed (and it comes behind a flag/checkbox) I am confident he would accept a PR.