flux Cannot start on Ubuntu 24.04: "RuntimeError: unable to mmap ... Cannot allocate memory (12)"

I'm very new to this, and it's possible that I am missing the obvious.

ubuntu 24.04

$ uname -a
Linux benj-pc 6.8.0-40-generic #40-Ubuntu SMP PREEMPT_DYNAMIC Fri Jul  5 10:34:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

$ python -V
Python 3.10.14

$ lspci -vnn | grep -A 12 '\''[030[02]\]' | grep -Ei "vga|3d|display|kernel"
0c:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] [1002:73ff] (rev c7) (prog-if 00 [VGA controller])
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            31Gi        10Gi       1.2Gi       206Mi        19Gi        20Gi
Swap:          8.0Gi       512Ki       8.0Gi

I followed the README instructions, and I get the following:

$ python -m flux --name flux-schnell --loop
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
/home/benj/workspace/flux/.venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Traceback (most recent call last):
  File "/home/benj/workspace/flux/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 575, in load_state_dict
    return torch.load(
  File "/home/benj/workspace/flux/.venv/lib/python3.10/site-packages/torch/serialization.py", line 1087, in load
    overall_storage = torch.UntypedStorage.from_file(os.fspath(f), shared, size)
RuntimeError: unable to mmap 44541587809 bytes from file </home/benj/.cache/huggingface/hub/models--google--t5-v1_1-xxl/snapshots/3db67ab1af984cf10548a73467f0e5bca2aaaeb2/pytorch_model.bin>: Cannot allocate memory (12)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/benj/workspace/flux/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 584, in load_state_dict
    if f.read(7) == "version":
  File "/usr/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/benj/workspace/flux/src/flux/__main__.py", line 4, in <module>
    app()
  File "/home/benj/workspace/flux/src/flux/cli.py", line 250, in app
    Fire(main)
  File "/home/benj/workspace/flux/.venv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/benj/workspace/flux/.venv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/benj/workspace/flux/.venv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/benj/workspace/flux/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/benj/workspace/flux/src/flux/cli.py", line 158, in main
    t5 = load_t5(torch_device, max_length=256 if name == "flux-schnell" else 512)
  File "/home/benj/workspace/flux/src/flux/util.py", line 131, in load_t5
    return HFEmbedder("google/t5-v1_1-xxl", max_length=max_length, torch_dtype=torch.bfloat16).to(device)
  File "/home/benj/workspace/flux/src/flux/modules/conditioner.py", line 18, in __init__
    self.hf_module: T5EncoderModel = T5EncoderModel.from_pretrained(version, **hf_kwargs)
  File "/home/benj/workspace/flux/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3738, in from_pretrained
    state_dict = load_state_dict(resolved_archive_file)
  File "/home/benj/workspace/flux/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 596, in load_state_dict
    raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for '/home/benj/.cache/huggingface/hub/models--google--t5-v1_1-xxl/snapshots/3db67ab1af984cf10548a73467f0e5bca2aaaeb2/pytorch_model.bin' at '/home/benj/.cache/huggingface/hub/models--google--t5-v1_1-xxl/snapshots/3db67ab1af984cf10548a73467f0e5bca2aaaeb2/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

Aug 14 '24 19:08 randoum

RTX 4070 12G - Same problem.

At first, something takes a long time to load, it takes up 9 GB GPU memory, then the message "Loading checkpoint" appears. Then this message:

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 72.00 MiB. GPU 0 has a total capacity of 11.73 GiB of which 52.81 MiB is free

Aug 16 '24 08:08 andchir

You both do not have enough VRAM. This model, even the smaller one, is still very large. As far as I understand, you really do need something like an A100 to run this. I cannot run this on my 4070 Super because UVM doesn't work on nvidia drivers (it should work, but it doesn't), and like you, I run out of VRAM.

If you do not have enough VRAM, or working "Shared Memory" (nvidia doesn't work: https://github.com/NVIDIA/open-gpu-kernel-modules/issues/663)

I can't tell you for AMD, how to get shared memory to work. I have never had an AMD card. Good luck.

If you want to try it without the GPU, python -m flux --name flux-schnell --loop --device cpu will allow you to run this. However, as you can imagine, very, very slow. Integrated GPUs also can be used, if you have one. Usually these have good shared memory support.

Unfortunately, on Windows, shared memory usually works fine. You may have better luck with that+ WSL2.

The real solution is for the graphics card manufacturers to fix their shared memory implementation in their linux kernel drivers. You can send them a strongly worded email :smiley:

Aug 20 '24 23:08 Dinsmoor

I have a 3060 12GB, driver version 555.58.02, and I am running Pop OS, using the provided scripts gave me the same out of memory error. However, I was able to run it with the following script in the same venv.

import torch
from diffusers import FluxPipeline

torch.cuda.empty_cache()

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()

prompt = "Your Prompt Here"
out = pipe(
    prompt=prompt,
    guidance_scale=1.5,
    height=768,
    width=1360,
    num_inference_steps=7,
).images[0]
out.save("image.png")

Hope this helps.

Aug 21 '24 02:08 godtier

Thanks man. It sucks the startup examples given in the readme don't take this into account - however - an effective gatekeeping method (It certainly gatekept me!)

Sep 02 '24 05:09 Dinsmoor

flux flux copied to clipboard

Cannot start on Ubuntu 24.04: "RuntimeError: unable to mmap ... Cannot allocate memory (12)"

flux
flux copied to clipboard