rustup icon indicating copy to clipboard operation
rustup copied to clipboard

Invalid cross-device link (os error 18) when upgrading on a docker OverlayFS

Open wraithan opened this issue 8 years ago • 22 comments

$ rustup update nightly
info: syncing channel updates for 'nightly-x86_64-unknown-linux-gnu'
info: latest update on 2017-08-21, rust version 1.21.0-nightly (8c303ed87 2017-08-20)
info: downloading component 'rustc'
info: downloading component 'rust-std'
info: downloading component 'cargo'
info: downloading component 'rust-docs'
info: removing component 'rustc'
info: rolling back changes
error: could not rename component directory from '/root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/etc' to '/root/.rustup/tmp/x5u5mnp0hhtywco8_dir/bk'
info: caused by: Invalid cross-device link (os error 18)

std::fs::rename() basically doesn't work on OverlayFS as far as I can tell by looking at other similar reports for various languages and projects hitting cross-device link errors on OverlayFS is boils down to using the rename syscall.

I'd like to propose wrapping the std::fs::rename() calls and if on linux detect os error 18 attempt to do a copy and delete instead. There are periodic other reports of errors like this on various platforms, the wrapper could try to handle the other OS cases too if they have a similar error code (or maybe even the same one if this is standard, I'm not sure).

Interestingly there is the bootstrap/update problem where folks who are experiencing may be unable to update their rustup install and not be able get the update that fixes the problem once there is a solution. Those folks will need to be advised to reinstall their rustup.

If the proposed solution to the problem works for the dev team, I'll attempt to provide a PR within a week of getting the go ahead.

This is relevant because some people use a common Docker image for their CI environments that may not be updated frequently enough for beta/nightly and have rustup update $desired_env in their script. Which is how I found this problem.

wraithan avatar Aug 21 '17 22:08 wraithan

Spoke with @nrc and @alexcrichton on IRC and they said this seemed reasonable. I'll put forward an implementation this week.

wraithan avatar Aug 21 '17 22:08 wraithan

Heyo ! Any news on this one ? I encounter this bug regularly when doing builds on dockerized CI. Let me know if there is any more info I can provide.

cyplo avatar Dec 17 '17 13:12 cyplo

Looking at the sources, there already exists a wrapper function called utils::rename_file, it's used by components and transaction. Would that be a good candidate here to replace every other call to fs::rename ?

cyplo avatar Dec 17 '17 13:12 cyplo

For those affected by this bug, see the renaming section of the kernel documentation.

@wraithan fs::rename inside std implies atomicity. For a renaming operation that doesn't fail, we should put it in a separate crate, as copying will likely involve locking.

ishitatsuyuki avatar Apr 05 '18 04:04 ishitatsuyuki

Heya, thank you to @nrc for taking a look :) (https://internals.rust-lang.org/t/contributing-to-rustup-help-with-code-structure-needed/7193). I'm thinking of trying to tackle this bug, I like writing the replication test first, so would probably focus on this.; to try to inject the fault in the test and see what's what. Let me know if someone else wants to look into that as well, we can combine forces :)

cyplo avatar Apr 06 '18 07:04 cyplo

Hi, I haven't had much time to finish working on this and the issue is still present for newest rustup. Let me know if anyone would like to pick this one up.

cyplo avatar Jun 08 '19 15:06 cyplo

@rustbot label: +O-containers

workingjubilee avatar May 21 '21 19:05 workingjubilee

Hey all! Any news on this? @cyplo @wraithan @workingjubilee I'm building a docker image and get "Invalid cross-device link" in the RUN rustup update nightly instruction of my Dockerfile.

@ishitatsuyuki thanks for the documentation. I see that this problem has to do with "redirect_dir" being disabled. So any idea how to enable it through the Dockerfile?

CatarinaPedreira avatar Sep 14 '21 11:09 CatarinaPedreira

@CatarinaPedreira If you need to work around the issue, just remove the toolchain and install it again. I think it would avoid involving renaming across overlayfs boundary.

ishitatsuyuki avatar Sep 14 '21 11:09 ishitatsuyuki

@ishitatsuyuki Thanks for the quick reply. I'll do that then, thank you :)

CatarinaPedreira avatar Sep 14 '21 11:09 CatarinaPedreira

Copy+Delete would be exceedingly slow because the rename stuff is used in our transactional filesystem accessing code. If we had to open+open+{read,write,loop}+close+close rather than rename then our toolchain update process would become immensely slow. Perhaps we can detect that particular OS error by attempting a rename on something innocuous first, and if that fails, refuse to update a toolchain on such a filesystem. Though that would prevent the installation of new components/targets too. More thought needed, but in the short term the workaround is to either not include a toolchain in your underlying docker image, or else remove and then install the toolchain in your CI.

kinnison avatar Sep 15 '21 07:09 kinnison

Thank you @kinnison !

CatarinaPedreira avatar Sep 15 '21 10:09 CatarinaPedreira

As of rust 1.63.0 I seem to be encountering this issue again during the clippy stage. Posting the relevant log:

$ CARGO_HOME=/usr/local/cargo rustup update stable
info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'
info: latest update on 2022-08-11, rust version 1.63.0 (4b91a6ea7 2022-08-08)
info: downloading component 'clippy'
info: downloading component 'cargo'
info: downloading component 'rust-std'
info: downloading component 'rustc'
info: removing previous version of component 'clippy'
info: rolling back changes
error: could not rename component file from '/usr/local/rustup/toolchains/stable-x86_64-unknown-linux-gnu/share/doc/clippy' to '/usr/local/rustup/tmp/1vsy16kvdse0rwk9_dir/bk': Invalid cross-device link (os error 18)
Cleaning up file based variables 00:00
ERROR: Job failed: command terminated with exit code 1

Could this have creeped back in somewhere?

ThatGeoGuy avatar Aug 11 '22 17:08 ThatGeoGuy

No, what is happening is that you are updating toolchain across docker layers. Either or the correct toolchain in your docker build, or remove and reinstall your toolchains

rbtcollins avatar Aug 11 '22 22:08 rbtcollins

I'm experiencing this issue in Fedora Linux. The same logic works nicely in Debian-based systems like Ubuntu, but the error Invalid cross-device link (os error 18) happens in Fedora. The steps to reproduce it are:

  1. Download Edge from https://packages.microsoft.com/repos/edge/pool/main/m/microsoft-edge-stable/microsoft-edge-stable_123.0.2420.53-1_amd64.deb
  2. Extract the content of the DEB file
  3. Try to move the resulting parent folder to a different path using fs::rename()

bonigarcia avatar Mar 26 '24 22:03 bonigarcia

@bonigarcia what does your problem have to do with rustup?

djc avatar Mar 27 '24 10:03 djc

@djc I believe this problem happens in fs::rename(). If this is not the right place to discuss it, do you know where I should report it?

bonigarcia avatar Mar 27 '24 12:03 bonigarcia

The rust-lang/rust issue tracker covers the standard library.

djc avatar Mar 27 '24 12:03 djc

What's the status of this bug? I'm getting the exact same error when using act to run github workflows locally. Specifically it happens when my actions require the stable channel version of the toolchain with dtolnay/rust-toolchain@stable.

ThrasherLT avatar Mar 24 '25 08:03 ThrasherLT

What's the status of this bug? I'm getting the exact same error when using act to run github workflows locally. Specifically it happens when my actions require the stable channel version of the toolchain with dtolnay/rust-toolchain@stable.

I think you probably already have a stable rust toolchain installed, and are trying to install another. This was my problem. I have a matrix of versions (1.82, stable and beta) that I run. Only the stable version had this error, and I used the same image (with a version of rust already installed, on all of them.

So my suggestion @ThrasherLT , try using an image you know doesn't have rust installed. @rbtcollins has it right in https://github.com/rust-lang/rustup/issues/1239#issuecomment-1212541656 methinks.

illogic-al avatar May 19 '25 19:05 illogic-al

Also experiencing this error calling rustup component remove rust-docs inside a clean Docker container. Most definitely not a cross-device link. The root cause appears to be a problem renaming hardlinked directories.

MCE
# Dockerfile
FROM debian
ENV PATH=/root/.cargo/bin:$PATH
RUN apt update && apt install -y curl && \
  curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | \
  sh -s -- -y --profile minimal
RUN rustup component add rust-docs
RUN rustup component remove rust-docs
Terminal output
$ docker build .
[+] Building 26.4s (7/7) FINISHED                                                                  docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                               0.0s
 => => transferring dockerfile: 308B                                                                               0.0s
 => [internal] load metadata for docker.io/library/debian:latest                                                   0.0s
 => [internal] load .dockerignore                                                                                  0.0s
 => => transferring context: 2B                                                                                    0.0s
 => CACHED [1/4] FROM docker.io/library/debian:latest@sha256:b6507e340c43553136f5078284c8c68d86ec8262b1724dde73c3  0.0s
 => => resolve docker.io/library/debian:latest@sha256:b6507e340c43553136f5078284c8c68d86ec8262b1724dde73c325e8d3d  0.0s
 => [2/4] RUN apt update && apt install -y curl &&   curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs |  20.6s
 => [3/4] RUN rustup component add rust-docs                                                                       5.2s
 => ERROR [4/4] RUN rustup component remove rust-docs                                                              0.5s
------
 > [4/4] RUN rustup component remove rust-docs:
0.461 info: removing component 'rust-docs'
0.470 info: rolling back changes
0.470 error: could not rename component file from '/root/.rustup/toolchains/stable-aarch64-unknown-linux-gnu/share/doc/rust/html' to '/root/.rustup/tmp/yo8q3q11xlm671mg_dir/bk': Invalid cross-device link (os error 18)
------
Dockerfile:8
--------------------
   6 |       sh -s -- -y --profile minimal
   7 |     RUN rustup component add rust-docs
   8 | >>> RUN rustup component remove rust-docs
   9 |
--------------------
ERROR: failed to build: failed to solve: process "/bin/sh -c rustup component remove rust-docs" did not complete successfully: exit code: 1
  • Docker Desktop mac arm 4.43.2 (199162)
    • Settings > General > Virtual Machine Options
      • [x] Apple Virtualization framework
        • [x] Use Rosetta for x86_64/amd64 emulation on Apple Silicon
        • [x] VirtioFS
  • Host OS: macOS 15.5 (24F74)

skull-squadron avatar Jul 30 '25 03:07 skull-squadron

@skull-squadron @bonigarcia Thanks for the report! Your observation is pretty relevant in this case... Quoting the overlayFS docs:

When renaming a directory that is on the lower layer or merged (i.e. the directory was not created on the upper layer to start with) overlayfs can handle it in two different ways:

  1. return EXDEV error: this error is returned by rename(2) when trying to move a file or directory across filesystem boundaries. ...

... however renames are essential to how rustup currently works with updates, so I don't think it would be practical to solve this problem without a non-trivial redesign.

rami3l avatar Jul 30 '25 04:07 rami3l