glid-3-xl-stable icon indicating copy to clipboard operation
glid-3-xl-stable copied to clipboard

Does it works with cpu too?

Open andreae293 opened this issue 2 years ago • 3 comments

Hi, has anyone ever tried to train with cpu? i know it will be super slow but im tried for the fun of it

i currently disabled my gpu by setting this line in image_train_stable.py torch.cuda.is_available = lambda : False

Traceback (most recent call last): File "scripts\image_train_stable.py", line 157, in main() File "scripts\image_train_stable.py", line 85, in main TrainLoop( File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 194, in run_loop self.run_step(batch, cond) File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 208, in run_step self.forward_backward(batch, cond) File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 236, in forward_backward losses = compute_losses() File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\respace.py", line 96, in training_losses return super().training_losses(self._wrap_model(model), *args, **kwargs) File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\gaussian_diffusion.py", line 1137, in training_losses model_output = model(x_t, self._scale_timesteps(t), **model_kwargs) File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\respace.py", line 133, in call return self.model(x, new_ts, **kwargs) File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\unet.py", line 880, in forward h = module(h, emb, context) File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\unet.py", line 217, in forward x = layer(x) File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\conv.py", line 447, in forward return self._conv_forward(input, self.weight, self.bias) File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\conv.py", line 443, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

sorry for bothering with useless question but am i doing something wrong? thanks

edit: nevermind i removed both .half() from the image_train_stable.py and deleted --use_fp16 from the training arguments

this way i was able to train on cpu

andreae293 avatar Sep 11 '22 10:09 andreae293

@andreae293 can you please provide more details? I tried:

export CUDA_VISIBLE_DEVICES=-1
MODEL_FLAGS="--actual_image_size 512 --lr_warmup_steps 10000 --ema_rate 0.9999 --attention_resolutions 64,32,16 --class_cond False --diffusion_steps 1000 --image_size 64 --learn_sigma False --noise_schedule linear --num_channels 320 --num_heads 8 --num_res_blocks 2 --resblock_updown False --use_fp16 False --use_scale_shift_norm False "
TRAIN_FLAGS="--lr 5e-5 --batch_size 32 --log_interval 10 --save_interval 5000 --kl_model kl.pt --resume_checkpoint diffusion.pt"
export OPENAI_LOGDIR=./logs/
python scripts/image_train_stable.py --data_dir /path/to/image-and-text-files $MODEL_FLAGS $TRAIN_FLAGS

where I change the default from README from --use_fp16 True to --use_fp16 False (and IIUC no need to remove .half() from the image_train_stable.py with this flag), but it gives:

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 15.78 GiB total capacity; 10.52 GiB already allocated; 3.86 GiB free; 10.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ie, still seems to use gpu instead of cpu

timotheecour4 avatar Oct 19 '22 23:10 timotheecour4

Stable Diffusion requires CUDA to run the AI, as it is the language for communicating with the GPU and preforming the necessary calculations. Using with the CPU would require a complete rewrite or virtualization which requires more RAM and money than it would take to go and buy a supported CUDA GPU. Although if anyone who is reading this is willing, would it be possible to utilize a TPU from Kaggle or Google Collab instead? I feel like it might be more efficient than a GPU or CPU, as it is meant for processing Tensors directly.

ghost avatar Oct 24 '22 19:10 ghost

@timotheecour4 sorry for the late response if you dont have enough RAM you have to dedicate a lot of GB to the virtual memory (paging file or swap memory) i dont know the minimum required by this repo,but i did dedicate 100 GB of virtual memory also if you are trying to finetune i suggest you to look for dreambooth for stable diffusion

@TheRealUnBot stable diffusion does not necessarily requires CUDA-supporting hardware to run since its based on pytorch you can run just fine in CPU with the downside of being x50 slower

andreae293 avatar Oct 25 '22 19:10 andreae293