pyinfra icon indicating copy to clipboard operation
pyinfra copied to clipboard

Can't download files larger than /tmp mount size

Open geerlingguy opened this issue 3 weeks ago • 2 comments

Describe the bug

I am trying to download some large files using files.download() to a Raspberry Pi. Due to the default filesystem layout, /tmp on the Pi is on tmpfs and on my 16 GB Pi CM5, I only get 8GB of data on that path:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            7.8G     0  7.8G   0% /dev
tmpfs           3.2G  9.8M  3.2G   1% /run
/dev/sda2       906G   22G  848G   3% /
tmpfs           7.9G  468K  7.9G   1% /dev/shm
tmpfs           5.0M   16K  5.0M   1% /run/lock
tmpfs           1.0M     0  1.0M   0% /run/credentials/systemd-journald.service
tmpfs           7.9G  7.9G     0 100% /tmp

When downloading larger files (e.g. 10 GB LLM models), Pyinfra eventually fails.

To Reproduce

Using a Raspberry Pi (or some other system with a small-ish /tmp directory, use the following Pyinfra task to download a large file:

files.download(
    name="Downloading model: gpt-oss-20b-Q4_K_M.gguf",
    src="https://huggingface.co/unsloth/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b-Q4_K_M.gguf",
    dest="~/Downloads/llama.cpp/models/gpt-oss-20b-Q4_K_M.gguf".format(working_dir, filename),
)

Expected behavior

The file should download into the path ~/Downloads/llama.cpp/models/gpt-oss-20b-Q4_K_M.gguf

Instead, I get:

    Starting nested operation: Downloading model: gpt-oss-20b-Q4_K_M.gguf (file 1 of 1) 
    [cm5.local] nested curl: (23) Failure writing output to destination, passed 15495 returned 0
    [cm5.local] nested Error: executed 0 commands

Meta

$ pyinfra --support

    If you are having issues with pyinfra or wish to make feature requests, please
    check out the GitHub issues at https://github.com/Fizzadar/pyinfra/issues .
    When adding an issue, be sure to include the following:

    System: Darwin
      Platform: macOS-15.6.1-arm64-arm-64bit-Mach-O
      Release: 24.6.0
      Machine: arm64
    pyinfra: v3.2
      black: v25.1.0
      black: v25.1.0
      click: v8.1.8
      distro: v1.9.0
      gevent: v24.11.1
      importlib_metadata: v8.6.1
      jinja2: v3.1.6
      packaging: v24.2
      paramiko: v3.5.1
      python-dateutil: v2.9.0.post0
      pywinrm: v0.5.0
      pyyaml: v6.0.2
      pyyaml: v6.0.2
      setuptools: v80.9.0
      typeguard: v4.4.2
      typing-extensions: v4.13.2
      wheel: v0.45.1
    Executable: /opt/homebrew/bin/pyinfra
    Python: 3.13.7 (CPython, Clang 17.0.0 (clang-1700.0.13.3))

geerlingguy avatar Nov 28 '25 16:11 geerlingguy

One potential workaround would be to specify a TMPDIR as an environment variable for curl.

geerlingguy avatar Nov 28 '25 16:11 geerlingguy

Another option would be setting config.TEMP_DIR (https://github.com/pyinfra-dev/pyinfra/blob/a43146afc6a3c2b34078b74651a81f89e5c7c02b/src/pyinfra/api/config.py#L28).

But I think a nicer option would be an additional _temp_dir global argument (which would fallback to the config).

Fizzadar avatar Nov 29 '25 09:11 Fizzadar

@Fizzadar let me know

@geerlingguy you can backport locally if you want

wowi42 avatar Dec 16 '25 16:12 wowi42

Thanks! This will help on my Pis, where my poor tmp dir is always 8GB and I'm often downloading rather large things for testing, like AI models that are 30+ GB :D

geerlingguy avatar Dec 17 '25 05:12 geerlingguy