dlite
dlite copied to clipboard
dlite not starting correctly after reboot
Bug Reports
-
dlite version in use (run
dlite --version
): dlite version 2.0.0-beta8 -
expected behavior: dlite should start correctly after reboot and make my day the best day ever.
-
actual behavior: dlite doesn't start correctly after reboot and makes the reboot day the worst day ever.
-
steps to reproduce I haven't got a clue
TL;DR
Something seems to be wrong with extractUser
, lookupUser
or proxy
on my machine, I don't really know...
My story
After the first install dlite starts without any problems and runs great, but after a reboot it won't start correctly. Directly after I log in I can find a dlite process but no hyperkit process in Activity Monitor. The dlite process is using 1-2 MB of RAM, which sounds small but probably isn't anything weird.
docker ps returns an error
$ docker ps
Error response from daemon: Unable to connect to the virtual machine
dlite start runs into a timeout. (two dlite processes during this time and when it's done one process is terminated and the original process persists)
$ dlite start
Starting the virtual machine: ERROR!
Timed out waiting for virtual machine
dlite stop runs into infinity and beyond until I press ctrl-c
. (the dlite process is still running)
Running dlite stop again after this:
$ dlite stop
Stopping the virtual machine: done
(the dlite process is still running)
Debug mode activated
So I start digging and I find out that a some commands makes a HTTP POST request to http://127.0.0.1:1050/[command]
.
Running curl -X POST http://127.0.0.1:1050/start
returns Unauthorized
Running curl -X POST --header "X-Username: emil" http://127.0.0.1:1050/start
returns Timed out waiting for virtual machine
Running curl -X POST http://127.0.0.1:1050/stop
returns Virtual machine is not running
(which is expected)
Using Chrome and visiting http://127.0.0.1:1050/status
also returns Unauthorized
.
It seems like there's something wrong with extractUser
, lookupUser
or proxy
.
I have tried to uninstall everything (I think) and reinstall dlite but with the same results.
I have tried to unload local.docker.plist
and loading it again but with the same results.
I'm experiencing same symptoms, except http://127.0.0.1:1050/status
returns valid looking status
{"id":"a54321e06-be54-11e6-9769-7056818e1367","hostname":"local.docker","disk_size":20,"disk_path":"/Users/kakoni/.dlite/disk.qcow","cpu_cores":2,"memory":2,"dns_server":"192.168.64.1","docker_version":"latest","docker_args":"","route":true,"started":true,"ip":"192.168.64.7","pid":8549}
My story
reboot
$ docker ps Cannot connect to the Docker daemon. Is the docker daemon running on this host?
$ ls -l /var/run/docker.sock srwxrwxrwx 1 root daemon 0 13 дек 23:03 /var/run/docker.sock
$ sudo rm -rf /var/run/docker.sock $ sudo launchctl stop local.dlite $ sudo launchctl start local.dlite $ dlite start Starting the virtual machine: ERROR! Timed out waiting for virtual machine $ dlite status vm_state: started ip_address: 192.168.64.4 pid: 963 id: aceb2746-a7d0-11e6-affc-80e6502222b0 hostname: local.docker disk_size: 25 disk_path: /Users/merkushin/.dlite/disk.qcow cpu_cores: 2 memory: 3 dns_server: 192.168.64.1 docker_version: latest docker_args: --bip=172.17.0.1/24 --dns=172.17.0.1
$ docker ps Error response from daemon: Unable to connect to the virtual machine
Waiting 1 minute...
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7b5e00b68d69 aacebedo/dnsdock:latest-amd64 "dnsdock" 4 weeks ago Up 2 seconds 172.17.0.1:53->53/udp dnsdock
can those of you experiencing this issue run dlite ssh
and tell me the version of dlite-os shown?
if you have a version earlier than 1.0.0-beta3 this problem should be fixed by re-running dlite init
, if you have dlite-os 1.0.0-beta3 and you're still experiencing this issue i have more debugging to do
$ dlite ssh
dlite-os version 1.0.0-beta3
Docker version 1.12.3, build 6b644ec
$ dlite ssh
[email protected]'s password:
ctrl + c =)
$ ssh docker@$(dlite ip)
[email protected]'s password:
dlite-os version 1.0.0-beta3
Docker version 1.12.3, build 6b644ec
A thing just hit me, I downloaded the binary from "Releases" and did not build it myself. Could this be a thing causing me trouble?
@getninjaN that shouldn't be causing a problem, that's the binary i run on my own laptop without issues.
interesting that the vm seems to be coming up and just isn't phoning home correctly.. can someone who is able to login to their vm run df
and tell me available disk space on their vm?
i'm getting this same issue (beta8) and
$ dlite ssh
ssh: connect to host local.docker port 22: Operation timed out
I managed to fix this.
TL;DR
PEBKAC 🙃
The long story
-
$ dlite stop
(2.0.0-beta8 or beta9) - Run Docker Toolbox Uninstall Script
-
$ brew uninstall docker docker-machine
-
$ dlite uninstall
(2.0.0-beta8 or beta9) -
$ brew uninstall dlite
(2.0.0-beta8 or beta9) -
$ brew install dlite
(1.1.5) -
$ brew uninstall dlite
(1.1.5) - Restart macOS
-
$ brew install docker-compose
- Download
dlite 2.0.0-beta9
-
$ cp dlite /usr/local/bin
-
$ dlite init
- Got
DISK ERROR!
as in #217 -
$ brew install libev
(To fix DISK ERROR!) -
$ dlite init
-
docker-compose up
and when this was done... - ... restart macOS
-
$ dlite start
- ...
- PROFIT!
Conclusion
Now everything is working like clockwork again. What the problem was from the beginning is probably a combination of having used Kitematic, Docker for Mac and dlite-1.1.5, without properly uninstalling them first and in between use.
I'm tired of waiting for fix of the problem 😸 docker-machine-driver-xhyve is working like a charm
Well sh*t... Ran into another problem now. My Mac was acting up and I had to kill it with the power switch.
Now when I try to run dlite start
I get this error
Starting the virtual machine: ERROR!
chown /Users/emil/.dlite/vm.tty: no such file or directory
In my console I get this for InternetSharing (/usr/libexec/InternetSharing)
2017-02-15 17:03:20.636867
com.docker.hyperkit: com.apple.NetworkSharing.broadcast-1 has been started
2017-02-15 17:03:20.650614
com.docker.hyperkit: com.apple.NetworkSharing.broadcast-1 (idle) has been stopped
Peace and love
well that's a new one.. i've actually seen the no such file or directory
for the vm.tty
before, though, so i'll open an issue for that specifically.
the InternetSharing stuff though, that's a new one. is there anything interesting in /Users/emil/.dlite/vm.log
? likely at the very bottom
Nope.. vm.log wasn't modified at all. Checked it when I first got the problem and after a new reboot, to see if that made it work, it had nothing new in it.
I can try to see if I'm able to reproduce this and check again. Tried a whole bunch of stupid things without any success so I just reinstalled.
Hi,
still an issue. Is there any progress on this issue?
Just downloaded the binary today.
dlite ssh dlite-os version 1.0.0-beta3 Docker version 1.12.3, build 6b644ec
This is the biggest issue I have with dlite ATM. Any progress here? Is dlite still being developed?
@synic sorry, yes. i'm still working on this one. doing some refactoring to make things more testable and also make it easier to handle error cases, and log more debugging information.
not being able to reproduce this one makes fixing it like playing a game of whack-a-mole in the dark with a blindfold on, rather than doing that i'm going to shuffle things around to try to isolate pieces of logic as much as possible. with that and some additional logging it should become a lot more clear when things go wrong. plus it means i can start actually writing unit tests for things, which will be nice.
it is, however, slow going. i promise it'll all be worth it in the long run though!
Anything I can do to help, let me know!
On Wed, Apr 5, 2017, 10:51 AM Nathan LaFreniere [email protected] wrote:
@synic https://github.com/synic sorry, yes. i'm still working on this one. doing some refactoring to make things more testable and also make it easier to handle error cases, and log more debugging information.
not being able to reproduce this one makes fixing it like playing a game of whack-a-mole in the dark with a blindfold on, rather than doing that i'm going to shuffle things around to try to isolate pieces of logic as much as possible. with that and some additional logging it should become a lot more clear when things go wrong. plus it means i can start actually writing unit tests for things, which will be nice.
it is, however, slow going. i promise it'll all be worth it in the long run though!
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/nlf/dlite/issues/214#issuecomment-291924521, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB4ut3f0CA9Ymx7sz1qlMS23vbtRinoks5rs8Z1gaJpZM4LBl2o .
Not sure if just a coincidence but … after over an hour of dlite start
and dlite stop
, I just deactivated my WLAN and ran dlite start
again – it started on the first try.
Maybe this helps.