sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

loss and lr not record on wandb

Open kf1111 opened this issue 2 years ago • 11 comments

image

I attempted to record the loss and learning rate of my lora learning, but only GPU information was recorded. My config.toml file contains the following settings:

log_with = "wandb" log_tracker_name = "lora_0511" wandb_api_key = "apikey"

pretrained_model_name_or_path = "....ckpt" train_data_dir = "..."

shuffle_caption = true caption_extension = ".txt" keep_tokens = 20 resolution = "768" vae_batch_size = 4 enable_bucket = true output_dir = "..." output_name = "..." save_precision = "fp16" save_every_n_epochs = 10

train_batch_size = 2 gradient_checkpointing = true gradient_accumulation_steps = 64

max_token_length = 150 xformers = true max_train_epochs = 50 persistent_data_loader_workers = true seed = 42 mixed_precision = "bf16" clip_skip = 2

multires_noise_iterations = 6 multires_noise_discount = 0.1

flip_aug = true use_8bit_adam = true lr_scheduler = "cosine_with_restarts" lr_warmup_steps = 12 lr_scheduler_num_cycles = 10 unet_lr = 0.0004 text_encoder_lr = 0.0002 network_module = "networks.lora" network_dim = 64 network_alpha = 32.0

https://github.com/kohya-ss/sd-scripts/pull/428 I read this page, and know it's ok to ignore "logging_dir"

kf1111 avatar May 10 '23 21:05 kf1111

Just tried it and its recording for me Screenshot 2023-05-11 at 23-02-34 Weights   Biases

rockerBOO avatar May 12 '23 03:05 rockerBOO

Could there be a problem with my Python environment?

Package Version


absl-py 1.4.0 accelerate 0.15.0 aiohttp 3.8.4 aiosignal 1.3.1 albumentations 1.3.0 altair 4.2.2 appdirs 1.4.4 astunparse 1.6.3 async-timeout 4.0.2 attrs 23.1.0 bitsandbytes 0.38.1 cachetools 5.3.0 certifi 2022.12.7 charset-normalizer 2.1.1 click 8.1.3 colorama 0.4.6 diffusers 0.10.2 docker-pycreds 0.4.0 easygui 0.98.3 einops 0.6.0 entrypoints 0.4 fairscale 0.4.13 filelock 3.9.0 flatbuffers 23.5.8 frozenlist 1.3.3 fsspec 2023.5.0 ftfy 6.1.1 gast 0.4.0 gitdb 4.0.10 GitPython 3.1.31 google-auth 2.18.0 google-auth-oauthlib 0.4.6 google-pasta 0.2.0 grpcio 1.54.0 h5py 3.8.0 huggingface-hub 0.13.3 idna 3.4 imageio 2.28.1 importlib-metadata 6.6.0 Jinja2 3.1.2 joblib 1.2.0 jsonschema 4.17.3 keras 2.10.0 Keras-Preprocessing 1.1.2 lazy_loader 0.2 libclang 16.0.0 library 0.0.0 lightning-utilities 0.8.0 Markdown 3.4.3 MarkupSafe 2.1.2 mpmath 1.2.1 multidict 6.0.4 mypy-extensions 1.0.0 networkx 3.0 numpy 1.24.1 oauthlib 3.2.2 opencv-python 4.7.0.68 opencv-python-headless 4.7.0.72 opt-einsum 3.3.0 packaging 23.1 pandas 2.0.1 pathtools 0.1.2 Pillow 9.3.0 pip 23.0.1 protobuf 3.19.6 psutil 5.9.5 pyasn1 0.5.0 pyasn1-modules 0.3.0 pyre-extensions 0.0.29 pyrsistent 0.19.3 python-dateutil 2.8.2 pytorch-lightning 1.9.0 pytz 2023.3 PyWavelets 1.4.1 PyYAML 6.0 qudida 0.0.4 regex 2023.5.5 requests 2.28.1 requests-oauthlib 1.3.1 rsa 4.9 safetensors 0.2.6 scikit-image 0.20.0 scikit-learn 1.2.2 scipy 1.10.1 sentry-sdk 1.22.2 setproctitle 1.3.2 setuptools 65.5.0 six 1.16.0 smmap 5.0.0 sympy 1.11.1 tensorboard 2.10.1 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorflow 2.10.1 tensorflow-estimator 2.10.0 tensorflow-io-gcs-filesystem 0.31.0 termcolor 2.3.0 threadpoolctl 3.1.0 tifffile 2023.4.12 timm 0.6.12 tokenizers 0.13.3 toml 0.10.2 toolz 0.12.0 torch 2.0.0+cu118 torchmetrics 0.11.4 torchvision 0.15.1+cu118 tqdm 4.65.0 transformers 4.26.0 typing_extensions 4.4.0 typing-inspect 0.8.0 tzdata 2023.3 urllib3 1.26.13 voluptuous 0.13.1 wandb 0.15.2 wcwidth 0.2.6 Werkzeug 2.3.4 wheel 0.40.0 wrapt 1.15.0 xformers 0.0.19 yarl 1.9.2 zipp 3.15.0

kf1111 avatar May 13 '23 15:05 kf1111

what commit are you currently on of sd-scripts?

rockerBOO avatar May 13 '23 15:05 rockerBOO

3b1af3f1a63b858af8c12662cbae70654229e327

kf1111 avatar May 13 '23 15:05 kf1111

The same issue occurs with the latest commit, c924c47f374ac1b6e33e71f82948eb1853e2243f

kf1111 avatar May 16 '23 11:05 kf1111

Same here, has this been resolved?

cian0 avatar Jun 20 '23 07:06 cian0

Same issue !

axel578 avatar Aug 11 '23 22:08 axel578

Same issue !

1099271 avatar Nov 15 '23 11:11 1099271

I am looking into this issue but if anyone having this issue can confirm any wandb warnings in their terminal/command/bat output?

rockerBOO avatar Nov 20 '23 15:11 rockerBOO

Haven't used sd-scripts for a long time, but I have some old wandb logs that might help.

1 epoch 1/100 2 F:\sd-scripts\venv\lib\site-packages\torch\utils\checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None 3 warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") 4 F:\sd-scripts\venv\lib\site-packages\xformers\ops\fmha\flash.py:338: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 5 and inp.query.storage().data_ptr() == inp.key.storage().data_ptr() 6 epoch 2/100 7 epoch 3/100 8 epoch 4/100 9 epoch 5/100 10 epoch 6/100 11 epoch 7/100 12 epoch 8/100 13 epoch 9/100 14 epoch 10/100

kf1111 avatar Nov 23 '23 13:11 kf1111

Per https://github.com/kohya-ss/sd-scripts/blob/f8f5b1695842cce15ba14e7edfacbeee41e71a75/train_network.py#L952

Metric logging (like loss) is only enabled when you provide a --logging_dir parameter.

emcmanus avatar Sep 05 '24 17:09 emcmanus