Andrew Suter-Morris
Andrew Suter-Morris
> Right, we do know that these tokens are frequently used, from the debug scenes we receive from customers. So it's something we must keep supporting. I created #1163 for...
reading through modules\controlnet.py, there is a CannyFilter which should apply on the fly. ``` edges = [cv2.Canny(x[i].mul(255).permute(1, 2, 0).cpu().numpy().astype(np.uint8), 100, 200) for i in range(len(x))] ``` This means it does...
Are you using 1B or 3.6B? That will make a difference (though still pretty hefty).
Per this: https://github.com/Stability-AI/StableCascade/issues/26 You likely need to downsize to the 1B Model. Update your config: ``` model_version: 1B ``` and ``` generator_checkpoint_path: models/stage_c_lite_bf16.safetensors ```
Sure, here's the result ```log 2024-04-23 21:06:22 - train.train - INFO - Running ['accelerate', 'launch', '--multi_gpu', '--num_processes=4', '/home/ubuntu/src/mvp/backend/train/huggingface_train_scripts/train_controlnet_sdxl.py', '--pretrained_model_name_or_path', 'stabilityai/stable-diffusion-xl-base-1.0', '--instance_data_dir', '/tmp/fai_cache/data/data', '--pretrained_vae_model_name_or_path', 'madebyollin/sdxl-vae-fp16-fix', '--output_dir', '/tmp/demo-20240423210622', '--resolution', '512', '--train_batch_size', '1',...
I tried running just that script to see. The dataset is quite tiny. No luck. CUDA out of memory still for multiple GPUs ``` File "/home/ubuntu/src/mvp/backend/train/huggingface_train_scripts/train_controlnet_webdataset_sdxl.py", line 1227, in main...
Ahh I understand now. I will give that a try. FWIW, I also looked at using deepspeed and other routes, all with the same result. But I'll pursue the embeddings...
So, we're going to have large datasets anyways. I opted to swap to webdataset. I tried running the webdataset controlnet script, and still it fails. The test dataset itself is...
I switched to SageMaker (p3.8xlarge) here to see if there's any availability. Still failing. Guidance requested. ``` !accelerate launch --mixed_precision="fp16" --num_processes=4 --multi_gpu scripts/train_controlnet_webdataset.py \ --pretrained_model_name_or_path "stabilityai/stable-diffusion-xl-base-1.0" \ --train_shards_path_or_url data \...