Fooocus icon indicating copy to clipboard operation
Fooocus copied to clipboard

Not enough gpu memory even though there is?

Open CasualVult opened this issue 1 year ago • 39 comments

I have an 6700XT, it has more than enough vram even though I'm getting this error and I did that fix where I allocated 8gb instead of 1

[Fooocus Model Management] Moving model(s) has taken 59.70 seconds 0%| | 0/30 [00:07<?, ?it/s] Traceback (most recent call last): File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\async_worker.py", line 803, in worker handler(task) File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\async_worker.py", line 735, in handler imgs = pipeline.process_diffusion( File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\default_pipeline.py", line 361, in process_diffusion sampled_latent = core.ksampler( File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\core.py", line 315, in ksampler samples = fcbh.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\sample.py", line 100, in sample samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\samplers.py", line 711, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\sample_hijack.py", line 151, in sample_hacked samples = sampler.sample(model_wrap, sigmas, extra_args, callback_wrap, noise, latent_image, denoise_mask, disable_pbar) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\samplers.py", line 556, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options) File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 701, in sample_dpmpp_2m_sde_gpu return sample_dpmpp_2m_sde(model, x, sigmas, extra_args=extra_args, callback=callback, disable=disable, eta=eta, s_noise=s_noise, noise_sampler=noise_sampler, solver_type=solver_type) File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 613, in sample_dpmpp_2m_sde denoised = model(x, sigmas[i] * s_in, **extra_args) File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\patch.py", line 329, in patched_KSamplerX0Inpaint_forward out = self.inner_model(x, sigma, File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in call_impl return forward_call(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\samplers.py", line 267, in forward return self.apply_model(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\samplers.py", line 264, in apply_model out = sampling_function(self.inner_model, x, timestep, uncond, cond, cond_scale, model_options=model_options, seed=seed) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\samplers.py", line 252, in sampling_function cond, uncond = calc_cond_uncond_batch(model, cond, uncond, x, timestep, model_options) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\samplers.py", line 230, in calc_cond_uncond_batch output = model.apply_model(input_x, timestep, **c).chunk(batch_chunks) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\model_base.py", line 68, in apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float() File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\patch.py", line 459, in patched_unet_forward h = forward_timestep_embed(module, h, emb, context, transformer_options, output_shape) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\diffusionmodules\openaimodel.py", line 37, in forward_timestep_embed x = layer(x, context, transformer_options) File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 560, in forward x = block(x, context=context[i], transformer_options=transformer_options) File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 390, in forward return checkpoint(self._forward, (x, context, transformer_options), self.parameters(), self.checkpoint) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\diffusionmodules\util.py", line 123, in checkpoint return func(*inputs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 455, in _forward n = self.attn1(n, context=context_attn1, value=value_attn1) File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 366, in forward out = optimized_attention(q, k, v, self.heads) File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 177, in attention_sub_quad hidden_states = efficient_dot_product_attention( File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\sub_quadratic_attention.py", line 244, in efficient_dot_product_attention res = torch.cat([ File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\sub_quadratic_attention.py", line 245, in compute_query_chunk_attn( File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\sub_quadratic_attention.py", line 160, in _get_attention_scores_no_kv_chunking attn_probs = attn_scores.softmax(dim=-1) RuntimeError: Could not allocate tensor with 165150720 bytes. There is not enough GPU video memory available! Total time: 77.58 seconds

CasualVult avatar Dec 08 '23 13:12 CasualVult

Same issue

stainz2004 avatar Dec 08 '23 13:12 stainz2004

same here, got 5700TX 8gb

AlexeyJersey avatar Dec 08 '23 14:12 AlexeyJersey

AMD cards are running into a memory loop issue. I think the devs are aware, if they're going to do anything about it? Not sure.

xjbar avatar Dec 08 '23 15:12 xjbar

Try this version of the app -> https://github.com/lllyasviel/Fooocus/tree/9660daff94b4d0f282567b96b3d387817818a4b3 Worked for me.

acvcleitao avatar Dec 08 '23 15:12 acvcleitao

Try this version of the app -> https://github.com/lllyasviel/Fooocus/tree/9660daff94b4d0f282567b96b3d387817818a4b3 Worked for me.

It's been broken for 2 months? That's disappointing.

Something somewhere almost seems hard coded to that memory amount, I get the same error. You can't even tell it to use CPU mode. And if you try --lowvram, it thinks you want nVidia again.

TDola avatar Dec 08 '23 18:12 TDola

Try this version of the app -> https://github.com/lllyasviel/Fooocus/tree/9660daff94b4d0f282567b96b3d387817818a4b3 Worked for me.

RX5700TX 8gb

RuntimeError: Could not allocate tensor with 26214400 bytes. There is not enough GPU video memory available! Total time: 67.72 seconds

AlexeyJersey avatar Dec 08 '23 19:12 AlexeyJersey

There have been some posts related to this issue. Some versions have this corrected some don't. Rolling back to the commit I mentioned solved it for me. Supposedly this was solved on 2.1.695 and then again in 2.1.703 which were both realesed arround the 18th of October. I'm having the same issues with exactly the same setup as the guy in the #700 (RTX 2060 6G). I don't know what to do.

Also something that's different bettween your error and mine is that my image generates almost fully. I only get the error after the image generation completes. It doesn't save it cause it tries to move the model and then crashes. You can't even get past the generation process right? It just straight up crashes. So my issue is probably due to CUDA and yours due to memory allocation for the start of the image generation.

acvcleitao avatar Dec 09 '23 11:12 acvcleitao

Try this version of the app -> https://github.com/lllyasviel/Fooocus/tree/9660daff94b4d0f282567b96b3d387817818a4b3 Worked for me.

Sorry for the stupid question but how do i roll back fooocus?

tobiasklnn avatar Dec 10 '23 16:12 tobiasklnn

Same issue here

heltonteixeira avatar Dec 10 '23 19:12 heltonteixeira

Try this version of the app -> https://github.com/lllyasviel/Fooocus/tree/9660daff94b4d0f282567b96b3d387817818a4b3 Worked for me.

Sorry for the stupid question but how do i roll back fooocus?

You may use these lines :

git checkout 9660daff94b4d0f282567b96b3d387817818a4b3
python entry_with_update.py 

PierreLepagnol avatar Dec 10 '23 21:12 PierreLepagnol

when I try this branch on a clean install, I get

File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\cuda_init_.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

grendahl06 avatar Dec 10 '23 21:12 grendahl06

the latest version is tested with 2060 if it crash, check if you have enough system swap and latest nvidia driver. if it still does not work, paste full logs.

lllyasviel avatar Dec 10 '23 21:12 lllyasviel

Torch not compiled with CUDA enabled

thank you for the response. I think most of the people in this list have Radeon GPUs.

if it helps to have the full message, this is what I am seeing:

Traceback (most recent call last): File "threading.py", line 1016, in bootstrap_inner File "threading.py", line 953, in run File "D:\Fooocus_win64\Fooocus\modules\async_worker.py", line 18, in worker import modules.default_pipeline as pipeline File "D:\Fooocus_win64\Fooocus\modules\default_pipeline.py", line 258, in refresh_everything( File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\Fooocus_win64\Fooocus\modules\default_pipeline.py", line 253, in refresh_everything prepare_text_encoder(async_call=True) File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\Fooocus_win64\Fooocus\modules\default_pipeline.py", line 217, in prepare_text_encoder fcbh.model_management.load_models_gpu([final_clip.patcher, final_expansion.patcher]) File "D:\Fooocus_win64\Fooocus\modules\patch.py", line 479, in patched_load_models_gpu y = fcbh.model_management.load_models_gpu_origin(*args, **kwargs) File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_management.py", line 402, in load_models_gpu cur_loaded_model = loaded_model.model_load(lowvram_model_memory) File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_management.py", line 294, in model_load accelerate.dispatch_model(self.real_model, device_map=device_map, main_device=self.device) File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\big_modeling.py", line 371, in dispatch_model attach_align_device_hook_on_blocks( File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\hooks.py", line 536, in attach_align_device_hook_on_blocks attach_align_device_hook_on_blocks( File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\hooks.py", line 536, in attach_align_device_hook_on_blocks attach_align_device_hook_on_blocks( File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\hooks.py", line 506, in attach_align_device_hook_on_blocks add_hook_to_module(module, hook) File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\hooks.py", line 155, in add_hook_to_module module = hook.init_hook(module) File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\hooks.py", line 253, in init_hook set_module_tensor_to_device(module, name, self.execution_device) File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\utils\modeling.py", line 292, in set_module_tensor_to_device new_value = old_value.to(device) File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\cuda_init.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

grendahl06 avatar Dec 10 '23 21:12 grendahl06

"Torch not compiled with CUDA enabled" means user mistake and users do not follow official installation guide.

lllyasviel avatar Dec 10 '23 21:12 lllyasviel

I do have a NVIDIA GeForce RTX 2060, with 6Gb do you think I can run the model ?

I miss 20 mb... is there a way to quantize unet weigths ?

PierreLepagnol avatar Dec 10 '23 21:12 PierreLepagnol

the latest version is tested with 2060 if it crash, check if you have enough system swap and latest nvidia driver. if it still does not work, paste full logs.

lllyasviel avatar Dec 10 '23 21:12 lllyasviel

reverted all of my local changes, re-ran the setup. I am back to my original error message, if this helps: Radeon 6650XT 8GB, 32GB ram, AMD 7950

Traceback (most recent call last): File "D:\Fooocus_win64\Fooocus\modules\async_worker.py", line 803, in worker handler(task) File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\Fooocus_win64\Fooocus\modules\async_worker.py", line 735, in handler imgs = pipeline.process_diffusion( File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\Fooocus_win64\Fooocus\modules\default_pipeline.py", line 361, in process_diffusion sampled_latent = core.ksampler( File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\Fooocus_win64\Fooocus\modules\core.py", line 315, in ksampler samples = fcbh.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\sample.py", line 93, in sample real_model, positive_copy, negative_copy, noise_mask, models = prepare_sampling(model, noise.shape, positive, negative, noise_mask) File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\sample.py", line 86, in prepare_sampling fcbh.model_management.load_models_gpu([model] + models, model.memory_required(noise_shape) + inference_memory) File "D:\Fooocus_win64\Fooocus\modules\patch.py", line 494, in patched_load_models_gpu y = fcbh.model_management.load_models_gpu_origin(*args, **kwargs) File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_management.py", line 410, in load_models_gpu cur_loaded_model = loaded_model.model_load(lowvram_model_memory) File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_management.py", line 293, in model_load raise e File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_management.py", line 289, in model_load self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_patcher.py", line 191, in patch_model temp_weight = fcbh.model_management.cast_to_device(weight, device_to, torch.float32, copy=True) File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_management.py", line 532, in cast_to_device return tensor.to(device, copy=copy).to(dtype) RuntimeError: Could not allocate tensor with 117964800 bytes. There is not enough GPU video memory available! Total time: 24.52 seconds

grendahl06 avatar Dec 10 '23 21:12 grendahl06

(fooocus) [pierre@archlinux Fooocus]$ python entry_with_update.py --gpu-only --bf16-unet --bf16-vae --use-pytorch-cross-attention Update failed. 'refs/heads/HEAD' Update succeeded. Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] Fooocus version: 2.1.703 Running on local URL: http://127.0.0.1:7860

Thanks for being a Gradio user! If you have questions or feedback, please join our Discord server and chat with us: https://discord.gg/feTf9x3ZSB

To create a public link, set share=True in launch(). Opening in existing browser session. Total VRAM 5927 MB, total RAM 15861 MB Set vram state to: HIGH_VRAM Device: cuda:0 NVIDIA GeForce RTX 2060 : native VAE dtype: torch.bfloat16 Using pytorch cross attention [Fooocus] Disabling smart memory model_type EPS adm 2560 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE missing {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'} loaded straight to GPU Requested to load SDXLRefiner Loading 1 new model Refiner model loaded: /home/pierre/Documents/Fooocus/models/checkpoints/sd_xl_refiner_1.0_0.9vae.safetensors Exception in thread Thread-2 (worker): Traceback (most recent call last): File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/home/pierre/Documents/Fooocus/modules/async_worker.py", line 18, in worker import modules.default_pipeline as pipeline File "/home/pierre/Documents/Fooocus/modules/default_pipeline.py", line 258, in refresh_everything( File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/Documents/Fooocus/modules/default_pipeline.py", line 233, in refresh_everything refresh_base_model(base_model_name) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/Documents/Fooocus/modules/default_pipeline.py", line 96, in refresh_base_model xl_base = core.load_model(filename) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/Documents/Fooocus/modules/core.py", line 69, in load_model unet, clip, vae, clip_vision = load_checkpoint_guess_config(ckpt_filename, embedding_directory=embeddings_path) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/sd.py", line 427, in load_checkpoint_guess_config model = model_config.get_model(sd, "model.diffusion_model.", device=inital_load_device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/supported_models.py", line 156, in get_model out = model_base.SDXL(self, model_type=self.model_type(state_dict, prefix), device=device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/model_base.py", line 189, in init super().init(model_config, model_type, device=device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/model_base.py", line 24, in init self.diffusion_model = UNetModel(**unet_config, device=device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/diffusionmodules/openaimodel.py", line 446, in init layers.append(SpatialTransformer( File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 507, in init [BasicTransformerBlock(inner_dim, n_heads, d_head, dropout=dropout, context_dim=context_dim[d], File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 507, in [BasicTransformerBlock(inner_dim, n_heads, d_head, dropout=dropout, context_dim=context_dim[d], File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 353, in init self.ff = FeedForward(dim, dropout=dropout, glu=gated_ff, dtype=dtype, device=device, operations=operations) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 73, in init ) if not glu else GEGLU(dim, inner_dim, dtype=dtype, device=device, operations=operations) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 58, in init self.proj = operations.Linear(dim_in, dim_out * 2, dtype=dtype, device=device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ops.py", line 11, in init self.weight = torch.nn.Parameter(torch.empty((out_features, in_features), **factory_kwargs)) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacty of 5.79 GiB of which 19.81 MiB is free. Including non-PyTorch memory, this process has 5.76 GiB memory in use. Of the allocated memory 5.59 GiB is allocated by PyTorch, and 80.92 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

PierreLepagnol avatar Dec 10 '23 22:12 PierreLepagnol

if any AMD card says "Could not allocate tensor with X bytes. There is not enough GPU video memory available" then it means this amd card is not enough for run sdxl unfortunatly.

current amd is still experimental and does not works as good as nvidia.

however we promise best support across all software, that is, if you are able to use same AMD device in automatic1111 or comfyui or invoke or sdnext or etc, to run sdxl successfully, please let us know and we will support it.

however, if all software fail to run sdxl on your device, then we have no method to make it work.

also, the linux version of fooocus uses rocm, and may have better support for amd.

lllyasviel avatar Dec 10 '23 22:12 lllyasviel

if any AMD card says "Could not allocate tensor with X bytes. There is not enough GPU video memory available" then it means this amd card is not enough for run sdxl unfortunatly.

current amd is still experimental and does not works as good as nvidia.

however we promise best support across all software, that is, if you are able to use same AMD device in automatic1111 or comfyui or invoke or sdnext or etc, to run sdxl successfully, please let us know and we will support it.

however, if all software fail to run sdxl on your device, then we have no method to make it work.

also, the linux version of fooocus uses rocm, and may have better support for amd.

thank you for the answer. I will try to create a Linux VM later today or tomorrow. Do you recommend any specific flavors of linux?

Great work. I'm looking forward to being able to use it successfully.

grendahl06 avatar Dec 10 '23 22:12 grendahl06

(fooocus) [pierre@archlinux Fooocus]$ python entry_with_update.py --gpu-only --bf16-unet --bf16-vae --use-pytorch-cross-attention Update failed. 'refs/heads/HEAD' Update succeeded. Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] Fooocus version: 2.1.703 Running on local URL: http://127.0.0.1:7860

Thanks for being a Gradio user! If you have questions or feedback, please join our Discord server and chat with us: https://discord.gg/feTf9x3ZSB

To create a public link, set share=True in launch(). Opening in existing browser session. Total VRAM 5927 MB, total RAM 15861 MB Set vram state to: HIGH_VRAM Device: cuda:0 NVIDIA GeForce RTX 2060 : native VAE dtype: torch.bfloat16 Using pytorch cross attention [Fooocus] Disabling smart memory model_type EPS adm 2560 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE missing {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'} loaded straight to GPU Requested to load SDXLRefiner Loading 1 new model Refiner model loaded: /home/pierre/Documents/Fooocus/models/checkpoints/sd_xl_refiner_1.0_0.9vae.safetensors Exception in thread Thread-2 (worker): Traceback (most recent call last): File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/home/pierre/Documents/Fooocus/modules/async_worker.py", line 18, in worker import modules.default_pipeline as pipeline File "/home/pierre/Documents/Fooocus/modules/default_pipeline.py", line 258, in refresh_everything( File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/Documents/Fooocus/modules/default_pipeline.py", line 233, in refresh_everything refresh_base_model(base_model_name) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/Documents/Fooocus/modules/default_pipeline.py", line 96, in refresh_base_model xl_base = core.load_model(filename) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/Documents/Fooocus/modules/core.py", line 69, in load_model unet, clip, vae, clip_vision = load_checkpoint_guess_config(ckpt_filename, embedding_directory=embeddings_path) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/sd.py", line 427, in load_checkpoint_guess_config model = model_config.get_model(sd, "model.diffusion_model.", device=inital_load_device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/supported_models.py", line 156, in get_model out = model_base.SDXL(self, model_type=self.model_type(state_dict, prefix), device=device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/model_base.py", line 189, in init super().init(model_config, model_type, device=device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/model_base.py", line 24, in init self.diffusion_model = UNetModel(**unet_config, device=device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/diffusionmodules/openaimodel.py", line 446, in init layers.append(SpatialTransformer( File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 507, in init [BasicTransformerBlock(inner_dim, n_heads, d_head, dropout=dropout, context_dim=context_dim[d], File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 507, in [BasicTransformerBlock(inner_dim, n_heads, d_head, dropout=dropout, context_dim=context_dim[d], File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 353, in init self.ff = FeedForward(dim, dropout=dropout, glu=gated_ff, dtype=dtype, device=device, operations=operations) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 73, in init ) if not glu else GEGLU(dim, inner_dim, dtype=dtype, device=device, operations=operations) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 58, in init self.proj = operations.Linear(dim_in, dim_out * 2, dtype=dtype, device=device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ops.py", line 11, in init self.weight = torch.nn.Parameter(torch.empty((out_features, in_features), **factory_kwargs)) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacty of 5.79 GiB of which 19.81 MiB is free. Including non-PyTorch memory, this process has 5.76 GiB memory in use. Of the allocated memory 5.59 GiB is allocated by PyTorch, and 80.92 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

This log indicates that you are mislead by some bad turorials, and you have already broken your envs with wrong command flags. Please only trust official installation guide and try fresh install again.

lllyasviel avatar Dec 10 '23 22:12 lllyasviel

Sorry, but can someone tell me how to uninstall fooocus from windows? It is enough to just delete the files i downloaded?

itoch avatar Dec 11 '23 17:12 itoch

I sure hope this gets fixed for AMD. With nVidia leaving the consumer graphics card space, we are left with AMD and Intel.

TDola avatar Dec 11 '23 18:12 TDola

I sure hope this gets fixed for AMD. With nVidia leaving the consumer graphics card space, we are left with AMD and Intel.

same. In the mean time, I've used the --cpu switch, which takes roughly 50s/it with an AMD 7950. This ends up being ~25 minutes to find out how the code interprets my prompt.

grendahl06 avatar Dec 11 '23 18:12 grendahl06

I went down the rabbit hole on this. SDXL claims they do not support AMD cards on Windows. And the Linux version is reportedly very broken, requiring you to use specific versions based on your graphics card. Automatic1111 claims it does work on AMD, I have not tried it. ComfyUI also makes this claim but the instructions are equally vague and reports that it's broken too. So I think we just have to wait for SDXL to fix their bugs. And it seems they have little incentive to do so, they after all can afford an nVidia card. All hope is not lost however, just yesterday AMD released a DirectML update. It didn't fix this issue either, but it shows work is being done.

TDola avatar Dec 12 '23 13:12 TDola

Sorry, but can someone tell me how to uninstall fooocus from windows? It is enough to just delete the files i downloaded?

I would like to know that too. A nice clean uninstall :)

Stefan-Mayer avatar Dec 12 '23 19:12 Stefan-Mayer

hi deleting the folder is a clean uninstall if users follow the official installation guide

lllyasviel avatar Dec 12 '23 19:12 lllyasviel

I think i've missed a step in the setup where the Windows AMD says see previous in a recursive reference....

at any rate, from the Linux AMD section it says to run this command: pip uninstall torch torchvision torchaudio torchtext functorch xformers pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6

when I enter the first, it shows I have the CU builds, which is clearly part of the problem.

When I run the second, it says there is no version matching "none": ERROR: Could not find a version that satisfies the requirement torch (from versions: none) ERROR: No matching distribution found for torch

what do I need to add to this second command to get these dependencies?

Thank you

grendahl06 avatar Dec 12 '23 21:12 grendahl06

Hi,

for Windows AMD, please follow the section "Windows(AMD GPUs)"

for Linux AMD, please follow the section "Linux (AMD GPUs)"

if you see "Could not find a version that satisfies the requirement torch", then you are using linux guide for windows, and that will not work.

lllyasviel avatar Dec 12 '23 21:12 lllyasviel

Hi,

for Windows AMD, please follow the section "Windows(AMD GPUs)"

for Linux AMD, please follow the section "Linux (AMD GPUs)"

if you see "Could not find a version that satisfies the requirement torch", then you are using linux guide for windows, and that will not work.

thank you for the fast answer. when I run the command listed: .\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y .\python_embeded\python.exe -m pip install torch-directml .\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml --always-no-vram pause

it still tells me it is looking for Cuda.

In my reading of the Windows AMD install, it says download and then run the commands copied above. If I do not need to download these extra WHL files, can you tell me what step I've missed in order for the code to not look for Cuda dependencies?

the "--always-no-vram" switch appears to be new but always triggers the Cuda message

Thank you

grendahl06 avatar Dec 12 '23 21:12 grendahl06