puppet-archive Cache to decrease provisioning time in local environment

Folks, here in my organization frequently we (as most of you too, I guess) have to download some big files to test our manifests execution. We lost some time provisioning our VMs twice (or more) due to problems in our manifests and/or local network, what makes us to download these files repeatedly. Initially we thought about make a proxy to intercept and cache all requested files but this might request too many configurations on developer's machine or a VM to be used exclusively to this purpose.

So, a coworker made some changes to archive's module to allow cache with wget plus vagrant-cachier plugin (https://github.com/fgrehm/vagrant-cachier) and I adapted it a little to propose a fork. What do you think that it's better? The approach using proxy or the approach with vagrant-cachier plugin? The changes we made to archive module are in my fork (https://github.com/thiagomarinho/puppet-archive/tree/wget-cache), and I can make a pull request if you think that's relevant.

May 27 '16 18:05 thiagomarinho

@thiagomarinho My initial thoughts are that I don't like the idea of caching locally - using a remote caching-proxy and sending all requests through it is much cleaner. Also, if it helps, I believe puppet-archive won't re-download the file if it finds in on the local filesystem already. I could be confusing this with puppet-staging behaviour however.

Jun 05 '16 04:06 juniorsysadmin

Hi @juniorsysadmin, thank you for the answer :). Maybe my initial explanation was a little confused so I'll try to explain better the scenario: currently, if we need to destroy and recreate a virtual machine in local environment (ie vagrant destroy -f && vagrant up) any artifact needed to vm's provisioning will be downloaded again. So, what I'm proposing is to adapt the module to use wget cache functionality (with vagrant-cachier plugin, that will share cache folder - maintained in host - with guest machine). This funcionality is made only to testing purposes and it's disabled by default; to enable it you should add this code to your puppet manifest:

Archive {
  provider => 'wget',
  cache    => enabled,
}

That said, What do you think about it? I agree that using a remote proxy is cleaner, but I just wanted to be sure that I'm being clear about the problem. :)

Jun 07 '16 18:06 thiagomarinho

I think this is too niche. With vagrant in particular can't you push files to the box inside the vagrantfile?

Jun 08 '16 19:06 jyaworski

It sounds like this can be implemented with a few arguments to wget. If that is true, I'm down with that. I don't want caching code littering the module. But if one flag to puppet turns on one flag to wget I'm happy to have that.

Jun 09 '16 17:06 nibalizer

Hi @jyaworski yes, maybe it's too niche, indeed.. :( It's possible to push those files (yet I have not tried to do this), but I think that this might become a little tricky to manage. For example, every large file that should be downloaded during provision should also be put in Vagrantfile and previously downloaded to make it work.

Jun 10 '16 13:06 thiagomarinho

Hi @nibalizer, unfortunately only change wget flags is not enough to use this cache feature the way we designed. The behavior is: download the file to /var/wget/cache (if the file doesn't exist or if its timestamp has changed) and then copy it to desired location. vagrant-cachier plugin will keep previously downloaded artifacts in this directory.

Jun 10 '16 13:06 thiagomarinho

Not sure whether it's the best approach, but a cache might also come handy during a execution of beaker acceptance tests. This feature could open a door to introduce a cache mechanism here: https://github.com/puppetlabs/beaker/blob/master/lib/beaker/hypervisor/vagrant.rb#L21

Currently, vagrant-cachier supports a myriad of buckets (yum, apt, gem and others), but among puppet-archive providers, only ruby and wget would be suitable for caching - as long as a directory for this purpose is defined - otherwise we would need to explicitly define a bucket/cache_dir for each artifact to be cached using vagrant-cachier.

puppet-archive resource nature indeed prevents redownload of a file if you already have it, but this issue is discussing cache in another scope and for other purposes.

Simple copy of files would suffice in some scenarios, but try to imagine a case where the target directory for an archive extraction is only created after a series of successful steps.

Jun 13 '16 22:06 jairojunior