unilm TextDiffuser - When does the model starts to predict plausible results?

Describe Model I am using : TextDiffuser Hi, thanks for the great work. I'm trying to train the model on the portion of Mario-Laion image dataset (~50k images). But currently the images generated during the intermediate training procedures look complete noise. Would this be because it's the initial stage of the training (currently 2500 iterations)? Could you let me know when did you start to see plausible results? I just want to make sure if I did correctly to look for the bug if you did see the results in the early phase. Thanks,

Jul 06 '23 01:07 other-ones

Thanks for your attention to TextDiffuser. Please note that the visualization for training did not involve the classifier-free guidance. And when the visualization produce legible text (although the background may look unnatural), you can switch to inference.py and use cfg scale 7.5 for visualization.

It hard to say if just using a small amount of data for training. When using 1M data for training, it produce better results after ~five epochs.

Jul 06 '23 02:07 JingyeChen

Hi thanks for the comment. It seems the model is not being trained. The result below is the generated images after 25k iterations using the inference.py script. results

Could you suggest the part that I should take a closer look to fix this? Thanks

Jul 06 '23 14:07 other-ones

I upgraded the xformers from 0.0.16 to 0.0.17 as suggested in the code, and it's now getting some plausible results. However, I'm facing the increase in the gpu memory consumptions that I can not fit a model for training even with batchsize=1 on 24GB GPUs on multi-gpu training mode. I can run the training with batchsize=4 with single gpu training mode, but the consumption gets much heavier when multi-gpu training setting. I may be using GPU with smaller memories than yours, but this seems it I'm suspecting there's a reaon since I cannot even fit with batchsize=1. Would this be normal or should I suspect any problems in the setting? The following is the installed packages and the version for your information:

accelerate 0.18.0 aiohttp 3.8.4 aiosignal 1.3.1 async-timeout 4.0.2 attrs 23.1.0 blinker 1.6.2 certifi 2023.5.7 charset-normalizer 3.1.0 click 8.1.3 cmake 3.26.3 datasets 2.11.0 diffusers 0.17.0.dev0 /home/twkim/project/textdiffuser/diffusers dill 0.3.6 filelock 3.12.0 Flask 2.3.2 frozenlist 1.3.3 fsspec 2023.5.0 huggingface-hub 0.15.1 idna 3.4 importlib-metadata 6.6.0 itsdangerous 2.1.2 Jinja2 3.1.2 lit 16.0.5.post0 MarkupSafe 2.1.3 mpmath 1.3.0 multidict 6.0.4 multiprocess 0.70.14 mypy-extensions 1.0.0 networkx 3.1 numpy 1.24.2 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 opencv-python 4.1.2.30 packaging 23.1 pandas 2.0.2 Pillow 7.2.0 pip 23.1.2 psutil 5.9.5 pyarrow 12.0.0 pyre-extensions 0.0.23 PySnooper 1.1.1 python-dateutil 2.8.2 pytz 2023.3 PyYAML 6.0 regex 2023.6.3 requests 2.31.0 responses 0.18.0 setuptools 67.8.0 six 1.16.0 sympy 1.12 termcolor 2.3.0 tinydb 4.7.1 tokenizers 0.13.3 torch 2.0.0 TorchSnooper 0.8 torchvision 0.15.2 tqdm 4.65.0 transformers 4.27.4 triton 2.0.0 typing_extensions 4.6.3 typing-inspect 0.9.0 tzdata 2023.3 urllib3 2.0.2 Werkzeug 2.3.4 wheel 0.38.4 xformers 0.0.17 xxhash 3.2.0 yarl 1.9.2 zipp 3.15.0

Thanks for your help in advance

Jul 06 '23 15:07 other-ones

It seems weird. Did you use accelerate config to specify the number of GPU to be used?

Jul 06 '23 17:07 JingyeChen

accelerate==0.18.0
aiofiles==23.1.0
aiohttp==3.8.4
aiosignal==1.3.1
albumentations==1.3.0
altair==5.0.1
anyio==3.7.0
async-timeout==4.0.2
attrs==23.1.0
braceexpand==0.1.7
certifi==2022.12.7
charset-normalizer==3.1.0
click==8.1.3
cmake==3.26.3
contourpy==1.0.7
cycler==0.11.0
dataclasses==0.6
datasets==2.11.0
-e git+https://github.com/JingyeChen/diffusers.git@90d9acf2cbb29dfdd0f2204435c4c3f9d11381f0#egg=diffusers
dill==0.3.6
docker-pycreds==0.4.0
exceptiongroup==1.1.1
ExifRead-nocycle==3.0.1
fastapi==0.96.0
ffmpy==0.3.0
filelock==3.12.0
fire==0.4.0
fonttools==4.39.4
frozenlist==1.3.3
fsspec==2022.11.0
ftfy==6.1.1
gitdb==4.0.10
GitPython==3.1.31
gradio==3.33.1
gradio_client==0.2.5
h11==0.14.0
httpcore==0.17.2
httpx==0.24.1
huggingface-hub==0.14.1
idna==3.4
imageio==2.28.0
img2dataset==1.41.0
importlib-metadata==6.6.0
importlib-resources==5.12.0
Jinja2==3.1.2
joblib==1.2.0
jsonschema==4.17.3
kiwisolver==1.4.4
lazy_loader==0.2
linkify-it-py==2.0.2
lit==16.0.2
markdown-it-py==2.2.0
MarkupSafe==2.1.3
matplotlib==3.7.1
mdit-py-plugins==0.3.3
mdurl==0.1.2
multidict==6.0.4
multiprocess==0.70.14
mypy-extensions==1.0.0
networkx==3.1
numpy==1.24.2
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
opencv-python==4.1.2.30
opencv-python-headless==4.7.0.72
orjson==3.9.0
packaging==23.1
pandas==1.5.3
pathtools==0.1.2
Pillow==8.4.0
pkgutil_resolve_name==1.3.10
promise==2.3
protobuf==3.20.3
psutil==5.9.5
pyarrow==8.0.0
pycocoevalcap==1.2
pycocotools==2.0.6
pydantic==1.10.8
pydub==0.25.1
Pygments==2.15.1
pyparsing==3.0.9
pyre-extensions==0.0.23
pyrsistent==0.19.3
PySnooper==1.1.1
python-dateutil==2.8.2
python-multipart==0.0.6
pytorch-fid==0.3.0
pytz==2023.3
PyWavelets==1.4.1
PyYAML==6.0
qudida==0.0.4
regex==2023.3.23
requests==2.29.0
responses==0.18.0
scikit-image==0.20.0
scikit-learn==1.2.2
scipy==1.9.1
semantic-version==2.10.0
sentencepiece==0.1.99
sentry-sdk==1.21.0
setproctitle==1.3.2
shortuuid==1.0.11
six==1.16.0
smmap==5.0.0
sniffio==1.3.0
starlette==0.27.0
termcolor==2.3.0
threadpoolctl==3.1.0
tifffile==2023.4.12
tokenizers==0.13.3
toolz==0.12.0
torch==1.13.1
TorchSnooper==0.8
torchvision==0.2.1
tqdm==4.65.0
transformers==4.27.4
triton==2.0.0.post1
typing-inspect==0.8.0
typing_extensions==4.5.0
tzdata==2023.3
uc-micro-py==1.0.2
urllib3==1.26.15
uvicorn==0.22.0
wandb==0.12.21
wcwidth==0.2.6
webdataset==0.2.48
websockets==11.0.3
xformers==0.0.16
xxhash==3.2.0
yarl==1.9.2
zipp==3.15.0

Here is the environment I used. For your information.

Jul 06 '23 17:07 JingyeChen

This is the command I used:

export CUDA_VISIBLE_DEVICES=0,1,2,3; export PYTHONPATH=/home/twkim/project/textdiffuser; accelerate launch --main_process_port 2941 train.py
--train_batch_size=1
--gradient_accumulation_steps=1
--gradient_checkpointing
--mixed_precision="fp16"
--num_train_epochs=2000000
--max_train_steps=2000000000
--learning_rate=1e-5
--max_grad_norm=1
--lr_scheduler="constant"
--lr_warmup_steps=0
--output_dir="noaccum_batch4"
--enable_xformers_memory_efficient_attention
--dataloader_num_workers=4
--character_aware_loss_lambda=0.01
--drop_caption
--mask_all_ratio=0.5
--segmentation_mask_aug
--dataset_path="/data/twkim/diffusion/ocr-dataset/mario_laion_sampled"
--train_dataset_index_file=laion_ocr_index.txt
--vis_num=4 --vis_interval=500
--checkpointing_steps 5000

It won't fit to 24GB gpu. if batchsize=4 and on single gpu, it takes 22GB, is this normal?

Thanks.

Jul 06 '23 17:07 other-ones

It seems weird. Did you use accelerate config to specify the number of GPU to be used?

Maybe you can try this

Jul 06 '23 17:07 JingyeChen

Hi,

Thanks for the suggestion. I've degraded from 0.0.17 to xformers==0.0.16, the same as your config, then the memory consumption got decreased. But the output became the complete noise again. I've also tried accelerate config as follows:

In which compute environment are you running? This machine

Which type of machine are you using?
multi-GPU
How many different machines will you use (use more than 1 for multi-node training)? [1]: 1
Do you wish to optimize your script with torch dynamo?[yes/NO]:no
Do you want to use DeepSpeed? [yes/NO]: no
Do you want to use FullyShardedDataParallel? [yes/NO]: no
Do you want to use Megatron-LM ? [yes/NO]: no How many GPU(s) should be used for distributed training? [1]:3 What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:1,2,3

Do you wish to use FP16 or BF16 (mixed precision)? fp16

Jul 06 '23 18:07 other-ones

Could you also let me know which GPU model you used for training? I see this message in train.py:

"xFormers 0.0.16 cannot be used for training in some GPUs. If you observe problems during training, please update xFormers to at least 0.0.17. See https://huggingface.co/docs/diffusers/main/en/optimization/xformers for more details."

I'v been using NVIDIA GeForce RTX 3090 for training. Would this be the reason? Besides, if I upgrade xformers to 0.0.17 as the suggestion, then memory consumption gets very high (almost doubled). I'm guessing xformers is not properly being adopted then.

Jul 06 '23 18:07 other-ones

Hi,

Thanks for the suggestion. I've degraded from 0.0.17 to xformers==0.0.16, the same as your config, then the memory consumption got decreased. But the output became the complete noise again. I've also tried accelerate config as follows:

In which compute environment are you running?

This machine

Which type of machine are you using?

multi-GPU How many different machines will you use (use more than 1 for multi-node training)? [1]: 1 Do you wish to optimize your script with torch dynamo?[yes/NO]:no Do you want to use DeepSpeed? [yes/NO]: no Do you want to use FullyShardedDataParallel? [yes/NO]: no Do you want to use Megatron-LM ? [yes/NO]: no How many GPU(s) should be used for distributed training? [1]:3 What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:1,2,3

Do you wish to use FP16 or BF16 (mixed precision)?

fp16

Same settings as yours. I notice that you used torch 2.0. You may try to degrade the pytorch version (e.g., 1.10~1.13) and install compatible xformers accordingly.

Jul 06 '23 23:07 JingyeChen

Hi, I've tried that before with xformers==0.0.16 torch==1.13.1, but then the model does not get trained (gets complete noise)

If with xformers==0.0.16 torch==1.13.1 -> not getting trained I think it's the xformers==0.0.16 does not work on RTX3090 GPU as this link (https://huggingface.co/docs/diffusers/main/en/optimization/xformers) states.
If with xformers>=0.0.17 torch>=2.0 -> model gets trained but memory consumption gets extremely high.

I'm guessing that it would work properly with if the xformers part of the code gets updated. Would there be any plan on updating the code to be compatible with newer version of xformers (0.0.17 or newer)?

Jul 07 '23 01:07 other-ones

https://github.com/huggingface/diffusers/issues/2234#issuecomment-1416931212 Maybe you can try this. Currently we do not have plans to adapt code for xformers.

Jul 07 '23 04:07 JingyeChen

unilm unilm copied to clipboard

TextDiffuser - When does the model starts to predict plausible results?

In which compute environment are you running? This machine

Do you wish to use FP16 or BF16 (mixed precision)? fp16

In which compute environment are you running?

Which type of machine are you using?

Do you wish to use FP16 or BF16 (mixed precision)?

unilm
unilm copied to clipboard