vagueant
vagueant copied to clipboard
Provisioning hangs
I added the full path of a shell script that echos 'hello' to the provisioner variable in the vagueant.conf file and when I do vagueant up it hangs when running the provisioner.
I have the following processes running
root 17897 0.0 0.0 2864 1064 ? Ss 15:09 0:00 lxc-start -n provisioner-test -c /var/run/lxc/provisioner-test.console -d
root 17898 0.0 0.0 4240 540 pts/1 S+ 15:09 0:00 tail -f /var/lib/lxc/provisioner-test/rootfs/var/log/runonce.log
root 18166 0.0 0.0 5224 1412 pts/1 S+ 15:09 0:00 /bin/bash /usr/bin/lxc-wait -n provisioner-test -s STOPPED
I have processes that are in uninteruptable sleep state
frank@frankthetank:~/precise2$ ps aux | grep lxc
123 1619 0.0 0.0 3420 892 ? S Dec16 0:00 dnsmasq -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/lxc/dnsmasq.pid --conf-file= --listen-address 10.0.3.1 --dhcp-range 10.0.3.2,10.0.3.254 --dhcp-lease-max=253 --dhcp-no-override --except-interface=lo --interface=lxcbr0
root 19731 0.0 0.0 2864 904 ? Ds 15:14 0:00 lxc-start -n precise -d -c /var/run/lxc/precise.console
root 20136 0.0 0.0 2864 904 ? Ds 15:16 0:00 lxc-start -n precise -d -c /var/run/lxc/precise.console
root 20730 0.0 0.0 2864 904 ? Ds 15:19 0:00 lxc-start -n precise -d -c /var/run/lxc/precise.console
root 21124 0.0 0.0 2864 904 ? Ds 15:20 0:00 lxc-start -n precise2 -d -c /var/run/lxc/precise2.console
frank 21357 0.0 0.0 4396 820 pts/12 S+ 15:23 0:00 grep lxc
In in dmesg I see the following
[517896.334578] INFO: task lxc-start:20136 blocked for more than 120 seconds.
[517896.334580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[517896.334581] lxc-start D 00000000 0 20136 1 0x00000004
[517896.334585] ebf5de74 00200082 00000000 00000000 ebf5de20 f746f230 defd8000 0001d6bd
[517896.334591] c196be00 c196be00 6072be8a 0001d6bd f7babe00 ebd89960 c1107e95 ebf5de30
[517896.334598] c1107ec6 ebf5de6c c11521a2 eeda7070 c17bc906 ebc72a80 ebc72af8 ebf5de50
[517896.334605] Call Trace:
[517896.334610] [<c1107e95>] ? __free_pages+0x35/0x40
[517896.334614] [<c1107ec6>] ? free_pages+0x26/0x30
[517896.334617] [<c11521a2>] ? mount_fs+0xa2/0x180
[517896.334621] [<c106d48e>] ? lg_global_unlock+0x3e/0x50
[517896.334625] [<c15c95d3>] schedule+0x23/0x60
[517896.334628] [<c15c982d>] schedule_preempt_disabled+0xd/0x10
[517896.334632] [<c15c8586>] __mutex_lock_slowpath+0xc6/0x120
[517896.334635] [<c15c8114>] mutex_lock+0x24/0x40
[517896.334638] [<c14d62cc>] copy_net_ns+0x5c/0xd0
[517896.334642] [<c106a411>] create_new_namespaces+0xb1/0x150
[517896.334646] [<c106a5b2>] copy_namespaces+0x72/0xb0
[517896.334650] [<c10430cb>] copy_process.part.28+0x6db/0x10f0
[517896.334654] [<c1043c3a>] do_fork+0x11a/0x350
[517896.334658] [<c10185e4>] sys_clone+0x34/0x40
[517896.334661] [<c15d12d9>] ptregs_clone+0x15/0x3c
[517896.334665] [<c15ca5a4>] ? syscall_call+0x7/0xb
I will reboot and try again
I've been trying to reproduce this but haven't managed anything yet.
My next step will be to add some working (at least on my laptop :p ) examples to the repo to see if that helps.
Although given the current time of year it may be a week or two before I manage to push anything functional up :)
The multiple D state processes has me thinking maybe we some managed to start one with the same name multiple times - maybe I've got a race condition to sort out...
Cheers, Dave
Yeah I think it happens if I run 'vagueant up' multiple times or something, or destroying the lxc while it is tailing the provision log. In the meantime I have been working on a 'vagueant template' command, similar to the 'vagrant box' command. Probably have something working in the weekend. For now I wish you a merry christmas and best wishes for 2013! :-)