image-spec icon indicating copy to clipboard operation
image-spec copied to clipboard

Does OCI allow "cross-layer hardlinks"? And how should they be handled?

Open yhgu2000 opened this issue 5 months ago • 3 comments

Hi, OCI. I was writing an OCI image parser, and quickly realized there's some serious undefined behaviors about hardlinks.

First, let's recall that a hardlink is a filesystem entry that actually points to the same "file" (inode) as another filesystem entry. So, modifying a hardlink can lead to implicit and unpredictable changes to other filesystem entries, which actually provides a mean of implicit communication. Treating hardlinks as independent normal files can cause runtime error if the application relies on the implicit communication assumption of hardlinks. Second, to remind all of us, OCI image layers are in the tar format, e.g. POSIX pax/ustar/cpio standard, which allows hardlinks and duplicate paths.

Indeed, there has been some content about hardlinks in current specification. But they are not enough to answer the following questions:

  1. What if a layer contains an invalid hardlink, for example, pointing to an non-existent path? Should we consider the image as invalid or just ignore it?

    tar files can be simply considered as an array of POSIX files' metadata and content. As far as I know, most tar programs handle the tar archives in order. That is, they scan the file content from head to tail and do the file/dir/hardlink/symlink creating job according to the entry header, leaving the validity check to the OS filesystem.

    For example, the following tar file can be successfully extracted, where ./b is an hard link to ./a:

    ./a
    ./b => /a
    

    However, the following tar file may fails to work, as ./a is not created when ./b is scanned:

    ./b => /a
    ./a
    
  2. When creating the filesystem bundle, what should we do if a subsequent layer has an entry that is a hardlink in previous layer (sharing an inode with many other filesystem entries)? Should we unlink the filesystem entry with the previous inode and create a new inode with the data in the new layer, or to update the existing inode with the data (so that all hardlinked filesystem entries are affected)?

  3. When building the OCI image, how should it be recorded in the image if the user creates hardlinks to files of previous layer? In such case, the layer itself may be an error tar file, but can be extracted successfully under the condition that the previous layers are extracted in order.

  4. (I believe there are more problems with regard to the tar format. Comments are welcomed.)

There has been an issue about hardlink and symlink: https://github.com/opencontainers/image-spec/issues/857 . But I believe it does not covers all the problems I list above here.


Personally, for question 3, I did an experiment with Docker. I write a simple static-linked C program that creates a copy, a hardlink, a symlink, and print their inode id:

#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

int
copy(const char* source, const char* target)
{
  FILE* sf = fopen(source, "r");
  if (sf == NULL) {
    perror("Error opening source file");
    return EXIT_FAILURE;
  }

  FILE* tf = fopen(target, "w");
  if (tf == NULL) {
    perror("Error opening target file");
    fclose(sf);
    return EXIT_FAILURE;
  }

  char buffer[4096];
  size_t bytesRead;
  while ((bytesRead = fread(buffer, 1, sizeof(buffer), sf)) > 0) {
    if (fwrite(buffer, 1, bytesRead, tf) != bytesRead) {
      perror("Error writing to target file");
      fclose(sf);
      fclose(tf);
      return EXIT_FAILURE;
    }
  }

  fclose(sf);
  fclose(tf);
  return EXIT_SUCCESS;
}

int
main(int argc, char* argv[])
{
  if (argc == 1) {
    fprintf(
      stderr, "Usage: %s <c|h|s> <source_file> <destination_file>\n", argv[0]);
    return EXIT_FAILURE;
  }

  const char* mode = argv[1];
  const char* source = argv[2];
  const char* target = argv[3];

  if (mode[0] == 'c') {
    if (copy(source, target) == 0) {
      printf("File copied successfully: %s -> %s\n", source, target);
    } else {
      return EXIT_FAILURE;
    }
  }

  else if (mode[0] == 'h') {
    if (link(source, target) == 0) {
      printf("Hard link created successfully: %s -> %s\n", target, source);
    } else {
      perror("Error creating hard link");
      return EXIT_FAILURE;
    }
  }

  else if (mode[0] == 's') {
    if (symlink(source, target) == 0) {
      printf("Symbolic link created successfully: %s -> %s\n", target, source);
    } else {
      perror("Error creating symbolic link");
      return EXIT_FAILURE;
    }
  }

  else if (mode[0] == 'p') {
    for (int i = 2; i < argc; ++i) {
      printf("%s: ", argv[i]);

      struct stat st;
      if (stat(argv[i], &st) == 0) {
        printf("%lu-%lu ", st.st_dev, st.st_ino);
      } else {
        perror(argv[i]);
      }

      struct stat lst;
      if (lstat(argv[i], &lst) == 0) {
        printf("%lu-%lu ", lst.st_dev, lst.st_ino);
      } else {
        perror(argv[i]);
      }

      printf("\n");
    }
  }

  else {
    fprintf(stderr, "Invalid mode: %s\n", mode);
    return EXIT_FAILURE;
  }

  return EXIT_SUCCESS;
}

Then I build an image from scratch with the compiled C program:

FROM scratch
COPY a.out /a.out
RUN ["/a.out", "c", "/a.out", "/a"]
RUN ["/a.out", "h", "/a.out", "/b"]
RUN ["/a.out", "s", "/a.out", "/c"]
CMD ["/a.out", "p", "/a.out", "/a", "/b", "/c"]

When I run the image on the same machine that built it, here's the output:

/a.out: 97-19830717 97-19830717 
/a: 97-19830716 97-19830716 
/b: 97-19830717 97-19830717 
/c: 97-19830717 97-19830718 

We can see that /b is a hardlink to /a.out, as expected.

However, if I use docker save to dump the image into a .tar.gz file, I find that the /b entry in layer 3 actually has a type 0, which means it is stored as a normal file, instead of hardlink. To further validate my suspicion, I copy the .tar.gz file to another machine with Docker, and the result is:

/a.out: 120-962490983 120-962490983 
/a: 120-962490987 120-962490987 
/b: 120-962490991 120-962490991 
/c: 120-962490983 120-962782553 

This means /b is now a regular file, which is not expected, or it is? Anyway, this example indicates that even Docker is confused with such situation.

yhgu2000 avatar Sep 19 '24 14:09 yhgu2000