Giyeong Oh
Giyeong Oh
Can you attach your environments? Number of GPUs, configuration of accelerate, installed python libraries, training configuration, script for running one of sd-scripts, etc.. If you provide as detail as, you...
By doing above problem, I met another error. `TypeError: cannot pickle 'torch._C._distributed_c10d.ProcessGroup' object` Same Environment, but - ~training SDXL network~ When I changed smaller dataset, It works on sdxl network...
cache_text_encoder_outputs.py raises AttributeError: 'Namespace' object has no attribute 'deepspeed'
It is because `cache_text_encoder_outputs.py` does not prepare deepspeed config not like `train_.py` you can add ad-hoc for this 1) from library import deepspeed_utils 2) in line between 174-178, ``` train_util.add_sd_models_arguments(parser)...
Did you update this repo? try script after `git pull`.
First, thanks to report issue. Is there similar phenomena on different dataset?
> @BootsofLagrangian yes, on all datasets when using LION optimizer. I'm not sure, maybe LION optimizer should not work as good as Adam's optimizers with Deepspeed... But it doesn't break...
First, U-Net can consume batch of output of text-encoder like [n, **77**, 768]. So, training scripts utilize this property to extend length of tokens 75, 150, 225, and so on....
If a user wishes to utilize multiple captions, derived from raw data, a tagger, or a Vision-Language Model (VLM), the script could handle this through an alternative format or file....
> @BootsofLagrangian Do you have any idea what might be causing this problem? Interesting features. DeepSpeed upcasts precision to operate for optimizers. It might be one of the reason, but...
@jihnenglin I saw loss divergence under some unknown conditions. But I still can not found the reason why model divergence