terraform icon indicating copy to clipboard operation
terraform copied to clipboard

file provisioner behaves surprisingly when target folder not present

Open schmidlop opened this issue 8 years ago • 20 comments

Terraform Version

0.10.7

Terraform Configuration Files

resource "aws_instance" "master" {
  ...
  connection {
    type         = "ssh"
    host         = "${aws_instance.master.public_dns}"
    agent        = false
    user         = "ubuntu"
    private_key  = "${file(var.private_key_file)}"
  }

  provisioner "file" {
    source = "registry/config.yaml"
    destination = "/home/ubuntu/registry/config.yaml"
  }

  tags = {
    Name = "swarm-manager"
  }
}

Expected Behavior

Either

  1. config.yaml file located in directory /home/ubuntu/registry or
  2. error message stating that the directory doesn't exist

Actual Behavior

new file named registry located in /home/ubuntu

Steps to Reproduce

  1. terraform apply

Important Factoids

Adding the following before the file provisioner works around the issue:

  provisioner "remote-exec" {
    inline = [
       "cd /home/ubuntu",
       "sudo mkdir registry",
       "sudo chown ubuntu registry"
    ]
  }

Seems that remote-exec runs as root hence the chown command is necessary or else the file provisioner fails with permission denied.

schmidlop avatar Oct 12 '17 20:10 schmidlop

Also ran into this today.

mmacfadden avatar May 19 '18 14:05 mmacfadden

Same here. I would expect either an error or the directory to be automatically created.

jjhbw avatar Oct 23 '18 08:10 jjhbw

Even more fun, if you try to create /home/ubuntu/registry/config.yaml and /home/ubuntu/registry/file2.txt the content of the latter ends up in /home/ubuntu/registry. I guess that isn't all that surprising.

kcd83 avatar Jan 04 '19 04:01 kcd83

Just ran into this

pbibik avatar Feb 12 '19 19:02 pbibik

Still an issue in 2019...

jmdfm avatar May 14 '19 23:05 jmdfm

wasted a some good hours today going through my whole configuration again and again, believing I did something fundamentally wrong when my config file was being written into the directory name rather than on the full path. I gave up occasionally and decided to search the web and found this issue...

I am quite sad and depressed right now and my neck hurts because of the stress, specially because in the docs it is actually stated, though in a different context, that The foo directory on the remote machine will be created by Terraform so it completely misguided me into believing I made the mistake and the provisioner should actually be creating the config folder that was missing.

gchamon avatar Jul 02 '19 21:07 gchamon

I tested what is in the documentation. If I understood correctly, I could just use

provisioner "file" {
  source = "foo/"
  destination = "/home/user/bar"
}

with foo containing baz, that bar would be created in the remote and baz contents would be uploaded to /home/user/bar/baz:

If the source is /foo (no trailing slash), and the destination is /tmp, then the contents of /foo on the local machine will be uploaded to /tmp/foo on the remote machine. The foo directory on the remote machine will be created by Terraform. If the source, however, is /foo/ (a trailing slash is present), and the destination is /tmp, then the contents of /foo will be uploaded directly into /tmp.

What actually happens is that baz contents is uploaded to bar, which is even more suprising since baz isn't even referenced in the provisioner.

The workaround of (deleting and) creating bar beforehand works for both uploading the file explicitly or the folder contents.

At the very least this bug shoud be acknowledged and described in the docs to prevent this issue from causing more headaches to unsuspecting beginners.

gchamon avatar Jul 02 '19 22:07 gchamon

Unsuspecting beginner here ran into this today.

CharlesTBetz avatar Sep 22 '19 14:09 CharlesTBetz

Still having to do workarounds to get round this. It would be great to have this behave as expected.

afandian avatar Sep 26 '19 10:09 afandian

+1 The same problem.

okgolove avatar Nov 18 '19 20:11 okgolove

+1

bolau avatar Feb 21 '20 16:02 bolau

Run into this while trying to create a kubeconfig file on a remote VM and worked around like this:

resource "null_resource" "aksvmkubeconfig" {


  #https://github.com/hashicorp/terraform/issues/16330

  provisioner "remote-exec" {
    connection {
      host        = azurerm_public_ip.aksvm.ip_address
      type        = "ssh"
      user        = var.admin_username
      private_key = tls_private_key.sshkey.private_key_pem
    }
    inline = [
      "mkdir /home/ubuntu/.kube/"
    ]
  }
  provisioner "file" {
    connection {
      host        = azurerm_public_ip.aksvm.ip_address
      type        = "ssh"
      user        = var.admin_username
      private_key = tls_private_key.sshkey.private_key_pem
    }
    content     = azurerm_kubernetes_cluster.aks.kube_config_raw
    destination = "/home/ubuntu/.kube/config"
  }
}

ams0 avatar Feb 23 '20 16:02 ams0

Still an issue on 0.12

teej avatar Apr 02 '20 18:04 teej

wow, hit this too

SvenDowideit avatar Apr 04 '20 04:04 SvenDowideit

so instead of safely creating the ~/.ssh dir, and putting a single ssh key's pub file into authorized_keys, the easiest workaround, is to copy my entire ~/.ssh dir?

resource "null_resource" "controllerpi" {
  connection {
    type = "ssh"    
    user = var.initial_user
    password = var.initial_password
    host = "10.10.10.129"
  }

  provisioner "file" {
    #TODO: this is an aweful workaround to https://github.com/hashicorp/terraform/issues/16330
    # source      = "~/.ssh/id_rsa.pub"
    # destination = "/home/pi/.ssh/authorized_keys"
    source      = "~/.ssh"
    destination = "/home/pi/.ssh"
  }
}

mildly surprised

SvenDowideit avatar Apr 04 '20 04:04 SvenDowideit

+1 happened to me as well

BLshlomo avatar Dec 06 '20 23:12 BLshlomo

still an issue in 2022

one workaround is using clout-init with write_files

MarZab avatar Jan 25 '22 15:01 MarZab

Hi all!

The weird behavior here unfortunately seems to belong to the scp program on the target system, rather than Terraform itself. The file provisioner is acting as a scp client to upload the file to the target system, so exactly what happens on the other end is under the server's control.

I produced the problem outside of Terraform by pretending to be a scp client right at my terminal:

  • Run scp -vt target_dir to start the scp server. Terraform would normally create an SSH session and then run this command on the remote system to create a file transfer channel. In the example in the original issue comment above, target_dir would've been /home/ubuntu/registry: the directory that the subsequent scp operations will be relative to.

  • The SCP server expects a line-based protocol which I was able to mimic close enough for the OpenSSH implementation on my system, even though what I did here wasn't technically completely correct per the protocol:

    C0644 5 hello.txt
    hello
    
    

    The first line above means to Create a file with mode 0644, five bytes long, named hello.txt. That's followed by a newline character to end the command.

    Then the bytes hello are literally the five bytes of the file contents.

    Technically at this point the client is supposed to send a NUL character to positively acknowledge completion, but I can't easily type that on my keyboard so I just mashed enter a few times and that seemed to satisfy the OpenSSH server enough to make it give up and exit, returning me to my shell prompt. The create operation streams the data directly to disk, so the bytes I sent still ended up in the file nonetheless.

  • I then found a file called target_dir in my current working directory, containing the message hello.

    That matches the misbehavior that this bug report is about: we expected to find a directory named target_dir with a file called hello.txt inside.

It seems then that OpenSSH doesn't return an error if asked to start an SSH session in a non-existing directory. Instead, it seems to just ignore the specified filename and write the file contents directly into a file named after the target directory. Weird!

The scp source code is pretty arcane, but I think this odd behavior is a result of what it does when the target isn't a directory: it seems to just take the path giving on the command line and use that instead of the file specified in the C line. targisdir seems to be false when the target doesn't exist at all, and so I think we end up in this codepath where the target file is target_dir itself, instead of target_dir/hello.txt.

The scp protocol also includes a command D for creating directories. I did find one usage pattern that seems to get the result we'd hope for here:

$ scp -vrt .
D0777 0 foo
C0644 5 hello.txt
hello


This caused me to have a directory called foo with a file called hello.txt inside it, and the bytes hello inside that file. :tada:

Re-running that same sequence a second time indicated that this is idempotent: it'll happily ignore a request to create a directory that already exists but then create a file inside that directory.

With that said then, it does seem like there is a plausible sequence of scp commands we can run to ensure the desired effect, but there are some potential gotchas here:

  • The client needs to specify either . or / as the scp target directory depending on whether the given path is an absolute path or not, and then traverse one directory level at a time from there.

    So for the example in the original issue report, this would be something like:

    $ scp -vrt /
    D0755 0 home
    D0755 0 ubuntu
    D0755 0 registry
    C0644 NNN config.yaml
    ....
    

    This means that the client must be somehow aware of the filesystem conventions of the server. In particular, I think this strategy would not work well when the server is running Windows because the client would need to understand if the given path starts with a drive letter and use that as the scp target directory.

  • This requires running scp in "recursive" mode, which may have other consequences which we've not fully understood yet. I looked in the scp.c source code for references to iamrecursive, which is the flag set when the -r option appears on the command line. It doesn't seem to do anything bothersome, but I don't necessarily trust my read of this unfamiliar C code that relies a lot on global variables.

There is also the typical bothersome question about whether we can actually change this behavior now because of the Terraform v1.x Compatibility Promises. While the current behavior is definitely weird, we cannot be sure that someone isn't unknowingly depending on it for the correct behavior of their system and so would consider a fix of this bug to be a breaking change for their configuration. :confounded:

At this point we are largely just preserving the behavior of these provisioners as-is, without making any significant changes to their behavior, because provisioners are a last resort and there's so much existing code out there depending on these quirks that have existed now for many years. It may be more profitable to instead build a provider with an ssh_file (or similar) resource type that can treat a particular remote file as a normal Terraform resource under management. A resource type with only a create step is functionally equivalent to a provisioner anyway, and in this case it may be desirable to also offer a destroy step which deletes the file (though I expect that would be optional).

I see that there's already a community provider loafoe/ssh which offers an ssh_resource resource type that incorporates functionality from both the file provisioner and the remote-exec provisioner when running against an SSH server. I can't vouch for the behavior of that provider because I've not tried it or reviewed it, but it does show that there is a plausible design for a provider to supplant the use-cases of these legacy provisioners. A new provider would not be subject to the compatibility constraints of the provisioners and so would be free to solve this problem without regard to backward-compatibility. That particular provider implementation seems to be using a provider which has a similar scp usage pattern to Terraform's file provisioner though, which probably means it also requires the target directory to exist in order to get correct results. :thinking:

I think the best path here would be to investigate the possibility of an SSH provider which offers something similar to the third-party provider I linked above, and have that be the successor of both the file and remote-exec provisioners when targeting an SSH server. The provisioners would then live on primarily for backward-compatibility with existing configurations and not for use in new configurations (though that'll presumably also require there to be a matching WinRM provider for feature-parity).

As @MarZab said, it's preferable to use Cloud-Init to copy files onto your remote systems if you possibly can, as described in Passing data into virtual machines and other compute resources, but there are some target platforms which don't have a "user data" mechanism and so SSH connections to the running machine are the only option. Therefore I do think it's justified to retain a supported path for doing so, even if it becomes a provider rather than a provisioner moving forward.

For now I'm going to leave this in the bug process until we're able to decide better what to do with it, but I can at least tag it as "confirmed" because the problem is easy to reproduce with just scp itself, even without Terraform calling it.

Thanks all!

apparentlymart avatar Sep 30 '22 00:09 apparentlymart

@apparentlymart if scp can't be relied on to act as expected, why not check for the directory and create it if necessary before using scp? Am I missing something?

brandonzylstra avatar Oct 13 '25 19:10 brandonzylstra

Hi @brandonzylstra,

I no longer work on Terraform at HashiCorp, so what I'm about to say is just some personal remarks and I'm not speaking on behalf of the Terraform team.

The SSH protocol is primarily concerned with executing commands on the remote system. The scp protocol relies on there being a program literally called scp on the remote system which the client can execute and then stream data into. Performing any operation that the scp protocol cannot support would mean running some other command, which would then impose a new requirement for what software is available on the remote system. That would therefore be a breaking change.

The approach I described in my previous comment would effectively cause the directory to be created if it doesn't exist already, using only existing features of the scp protocol. However, it would change the existing behavior in ways that might be breaking for existing users that are (perhaps unknowingly) relying on the existing quirky behavior.

I don't personally see any option to make this "just work" in the way described without potentially breaking existing other configurations. I previously suggested to create a new provider to replace both of these provisioners and I stand by that recommendation, particularly with the forthcoming concept of "actions" which could therefore provide a more direct replacement for the remote-exec provisioner as an imperative action without an associated managed resource lifecycle.

However, a potential alternative approach would be to add a new argument to the file provisioner to opt in to the new behavior I described in my previous comment. Of course, it not being the default behavior means that the provisioner would still do arguably the "wrong" thing by default, which is annoying.

The Terraform team could also potentially choose to break anyone relying on the existing behavior if they think it's unlikely enough that anyone would be relying on it, and therefore make the new behavior the default.

With all of that said then: there are at least three paths here that all have different benefits and drawbacks, and so someone on the Terraform team will presumably need to analyze them all and decide which path (if any) to take.

apparentlymart avatar Oct 13 '25 20:10 apparentlymart