dlite icon indicating copy to clipboard operation
dlite copied to clipboard

dlite not starting correctly after reboot

Open getninjaN opened this issue 7 years ago • 18 comments

Bug Reports

  • dlite version in use (run dlite --version): dlite version 2.0.0-beta8

  • expected behavior: dlite should start correctly after reboot and make my day the best day ever.

  • actual behavior: dlite doesn't start correctly after reboot and makes the reboot day the worst day ever.

  • steps to reproduce I haven't got a clue

TL;DR

Something seems to be wrong with extractUser, lookupUser or proxy on my machine, I don't really know...

My story

After the first install dlite starts without any problems and runs great, but after a reboot it won't start correctly. Directly after I log in I can find a dlite process but no hyperkit process in Activity Monitor. The dlite process is using 1-2 MB of RAM, which sounds small but probably isn't anything weird.

docker ps returns an error

$ docker ps
Error response from daemon: Unable to connect to the virtual machine

dlite start runs into a timeout. (two dlite processes during this time and when it's done one process is terminated and the original process persists)

$ dlite start
Starting the virtual machine: ERROR!
Timed out waiting for virtual machine

dlite stop runs into infinity and beyond until I press ctrl-c. (the dlite process is still running)

Running dlite stop again after this:

$ dlite stop
Stopping the virtual machine: done

(the dlite process is still running)

Debug mode activated

So I start digging and I find out that a some commands makes a HTTP POST request to http://127.0.0.1:1050/[command].

Running curl -X POST http://127.0.0.1:1050/start returns Unauthorized Running curl -X POST --header "X-Username: emil" http://127.0.0.1:1050/start returns Timed out waiting for virtual machine Running curl -X POST http://127.0.0.1:1050/stopreturns Virtual machine is not running (which is expected) Using Chrome and visiting http://127.0.0.1:1050/status also returns Unauthorized.

It seems like there's something wrong with extractUser, lookupUser or proxy.

I have tried to uninstall everything (I think) and reinstall dlite but with the same results. I have tried to unload local.docker.plist and loading it again but with the same results.

getninjaN avatar Dec 01 '16 16:12 getninjaN

I'm experiencing same symptoms, except http://127.0.0.1:1050/status returns valid looking status

{"id":"a54321e06-be54-11e6-9769-7056818e1367","hostname":"local.docker","disk_size":20,"disk_path":"/Users/kakoni/.dlite/disk.qcow","cpu_cores":2,"memory":2,"dns_server":"192.168.64.1","docker_version":"latest","docker_args":"","route":true,"started":true,"ip":"192.168.64.7","pid":8549}

kakoni avatar Dec 09 '16 21:12 kakoni

My story

reboot

$ docker ps Cannot connect to the Docker daemon. Is the docker daemon running on this host?

$ ls -l /var/run/docker.sock srwxrwxrwx 1 root daemon 0 13 дек 23:03 /var/run/docker.sock

$ sudo rm -rf /var/run/docker.sock $ sudo launchctl stop local.dlite $ sudo launchctl start local.dlite $ dlite start Starting the virtual machine: ERROR! Timed out waiting for virtual machine $ dlite status vm_state: started ip_address: 192.168.64.4 pid: 963 id: aceb2746-a7d0-11e6-affc-80e6502222b0 hostname: local.docker disk_size: 25 disk_path: /Users/merkushin/.dlite/disk.qcow cpu_cores: 2 memory: 3 dns_server: 192.168.64.1 docker_version: latest docker_args: --bip=172.17.0.1/24 --dns=172.17.0.1

$ docker ps Error response from daemon: Unable to connect to the virtual machine

Waiting 1 minute...

$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7b5e00b68d69 aacebedo/dnsdock:latest-amd64 "dnsdock" 4 weeks ago Up 2 seconds 172.17.0.1:53->53/udp dnsdock

bibendi avatar Dec 13 '16 18:12 bibendi

can those of you experiencing this issue run dlite ssh and tell me the version of dlite-os shown?

if you have a version earlier than 1.0.0-beta3 this problem should be fixed by re-running dlite init, if you have dlite-os 1.0.0-beta3 and you're still experiencing this issue i have more debugging to do

nlf avatar Dec 13 '16 18:12 nlf

$ dlite ssh
dlite-os version 1.0.0-beta3
Docker version 1.12.3, build 6b644ec

getninjaN avatar Dec 14 '16 08:12 getninjaN

$ dlite ssh
[email protected]'s password:

ctrl + c =)

$ ssh docker@$(dlite ip)
[email protected]'s password:
dlite-os version 1.0.0-beta3
Docker version 1.12.3, build 6b644ec

bibendi avatar Dec 19 '16 11:12 bibendi

A thing just hit me, I downloaded the binary from "Releases" and did not build it myself. Could this be a thing causing me trouble?

getninjaN avatar Dec 28 '16 12:12 getninjaN

@getninjaN that shouldn't be causing a problem, that's the binary i run on my own laptop without issues.

interesting that the vm seems to be coming up and just isn't phoning home correctly.. can someone who is able to login to their vm run df and tell me available disk space on their vm?

nlf avatar Dec 31 '16 18:12 nlf

i'm getting this same issue (beta8) and

$ dlite ssh
ssh: connect to host local.docker port 22: Operation timed out

maeldur avatar Jan 22 '17 23:01 maeldur

I managed to fix this.

TL;DR

PEBKAC 🙃

The long story

  1. $ dlite stop (2.0.0-beta8 or beta9)
  2. Run Docker Toolbox Uninstall Script
  3. $ brew uninstall docker docker-machine
  4. $ dlite uninstall (2.0.0-beta8 or beta9)
  5. $ brew uninstall dlite (2.0.0-beta8 or beta9)
  6. $ brew install dlite (1.1.5)
  7. $ brew uninstall dlite (1.1.5)
  8. Restart macOS
  9. $ brew install docker-compose
  10. Download dlite 2.0.0-beta9
  11. $ cp dlite /usr/local/bin
  12. $ dlite init
  13. Got DISK ERROR! as in #217
  14. $ brew install libev (To fix DISK ERROR!)
  15. $ dlite init
  16. docker-compose up and when this was done...
  17. ... restart macOS
  18. $ dlite start
  19. ...
  20. PROFIT!

Conclusion

Now everything is working like clockwork again. What the problem was from the beginning is probably a combination of having used Kitematic, Docker for Mac and dlite-1.1.5, without properly uninstalling them first and in between use.

getninjaN avatar Feb 10 '17 13:02 getninjaN

I'm tired of waiting for fix of the problem 😸 docker-machine-driver-xhyve is working like a charm

bibendi avatar Feb 13 '17 05:02 bibendi

Well sh*t... Ran into another problem now. My Mac was acting up and I had to kill it with the power switch.

Now when I try to run dlite start I get this error

Starting the virtual machine: ERROR!
chown /Users/emil/.dlite/vm.tty: no such file or directory

In my console I get this for InternetSharing (/usr/libexec/InternetSharing)

2017-02-15 17:03:20.636867
com.docker.hyperkit: com.apple.NetworkSharing.broadcast-1 has been started
2017-02-15 17:03:20.650614
com.docker.hyperkit: com.apple.NetworkSharing.broadcast-1 (idle) has been stopped

Peace and love

getninjaN avatar Feb 15 '17 16:02 getninjaN

well that's a new one.. i've actually seen the no such file or directory for the vm.tty before, though, so i'll open an issue for that specifically.

the InternetSharing stuff though, that's a new one. is there anything interesting in /Users/emil/.dlite/vm.log? likely at the very bottom

nlf avatar Feb 15 '17 16:02 nlf

Nope.. vm.log wasn't modified at all. Checked it when I first got the problem and after a new reboot, to see if that made it work, it had nothing new in it.

I can try to see if I'm able to reproduce this and check again. Tried a whole bunch of stupid things without any success so I just reinstalled.

getninjaN avatar Feb 15 '17 16:02 getninjaN

Hi,

still an issue. Is there any progress on this issue?

Just downloaded the binary today.

dlite ssh dlite-os version 1.0.0-beta3 Docker version 1.12.3, build 6b644ec

KiSchulte avatar Mar 13 '17 15:03 KiSchulte

This is the biggest issue I have with dlite ATM. Any progress here? Is dlite still being developed?

synic avatar Apr 05 '17 16:04 synic

@synic sorry, yes. i'm still working on this one. doing some refactoring to make things more testable and also make it easier to handle error cases, and log more debugging information.

not being able to reproduce this one makes fixing it like playing a game of whack-a-mole in the dark with a blindfold on, rather than doing that i'm going to shuffle things around to try to isolate pieces of logic as much as possible. with that and some additional logging it should become a lot more clear when things go wrong. plus it means i can start actually writing unit tests for things, which will be nice.

it is, however, slow going. i promise it'll all be worth it in the long run though!

nlf avatar Apr 05 '17 16:04 nlf

Anything I can do to help, let me know!

On Wed, Apr 5, 2017, 10:51 AM Nathan LaFreniere [email protected] wrote:

@synic https://github.com/synic sorry, yes. i'm still working on this one. doing some refactoring to make things more testable and also make it easier to handle error cases, and log more debugging information.

not being able to reproduce this one makes fixing it like playing a game of whack-a-mole in the dark with a blindfold on, rather than doing that i'm going to shuffle things around to try to isolate pieces of logic as much as possible. with that and some additional logging it should become a lot more clear when things go wrong. plus it means i can start actually writing unit tests for things, which will be nice.

it is, however, slow going. i promise it'll all be worth it in the long run though!

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/nlf/dlite/issues/214#issuecomment-291924521, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB4ut3f0CA9Ymx7sz1qlMS23vbtRinoks5rs8Z1gaJpZM4LBl2o .

synic avatar Apr 05 '17 17:04 synic

Not sure if just a coincidence but … after over an hour of dlite start and dlite stop, I just deactivated my WLAN and ran dlite start again – it started on the first try.

Maybe this helps.

WoodrowShigeru avatar Nov 29 '18 10:11 WoodrowShigeru