nixos-infect icon indicating copy to clipboard operation
nixos-infect copied to clipboard

Loosing network on OVH VPS

Open charlycoste opened this issue 8 years ago • 15 comments
trafficstars

I tried on a VPS 2016 SSD 3 from OVH, each of these :

  • Debian 8
  • Debian 9
  • Ubuntu 16.04

And each time I run the script, then installation + reboot are going well. But after that, I can't connect to the VPS anymore. I accessed it by KVM to debug and it seems that the VPS just get disconnected from network.

charlycoste avatar Jul 16 '17 23:07 charlycoste

The network detection part of the script is not very robust and assumes digitalocean's idiosyncrasies. Likely , it didn't grab the right settings for your host. I'll probably improve it sometime in the future, but for now, try the following to manually provision your hosts:

(if you have console access to an already-provisioned host, do only step 5, instead using the ip info provided in the OVH web UI, and then nixos-rebuild switch)

  1. copy over the nixos-infect script
  2. edit it
  3. comment out the last 4 lines (makeSwap to reboot)
  4. run source nixos-infect; set +e. This will generate the config files but not try to install everything yet.
  5. edit /etc/nixos/networking.nix, correcting any obvious errors. Use commands ip addr, ip route, and cat /etc/resolv.conf to obtain any missing info. Probably remove eth1 entirely.
  6. edit nixos-infect again, and uncomment the lines you commented
  7. bash -x nixos-infect

Post me the networking.nix contents, if you still can't get it working.

elitak avatar Jul 17 '17 00:07 elitak

Yeah, for OVH, don't even bother with the networking, just rip it out completely.

Change this:

  imports = [
    ./hardware-configuration.nix
    ./networking.nix # generated at runtime by nixos-infect
    $NIXOS_IMPORT
  ];

to this:

  imports = [
    ./hardware-configuration.nix
    $NIXOS_IMPORT
  ];

I just successfully installed doing that. Debian 9

kniteli avatar Jul 14 '18 11:07 kniteli

I should probably add a --no-networking option that does this, or detect when the original system's network uses dhcp instead of manual config.

elitak avatar Jul 15 '18 07:07 elitak

Is this still an issue, since we do this?

asymmetric avatar Apr 20 '19 14:04 asymmetric

I'll try to check it out, then tell you if it's okay now.

charlycoste avatar Apr 24 '19 17:04 charlycoste

@asymmetric Yes, this is still an issue.

charlycoste avatar May 05 '19 14:05 charlycoste

I have big issues to make it run on master, does any of you (@asymmetric, @charlycoste, @kniteli ) would have a functioning version ? I could take care of piggy-backing on it to set up a PR for master to function on OVH.

TheSirC avatar Jul 03 '19 14:07 TheSirC

Don't use OVH, sorry.

asymmetric avatar Jul 04 '19 09:07 asymmetric

@TheSirC if you still need help, attach a .log here with the output after you set -x and run the script (comment out the reboot), because I have no detail on what your "big issues" are.

elitak avatar Jul 04 '19 19:07 elitak

@elitak Yes, of course. Here is the error-log :

Error log
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 11.4978 s, 93.4 MB/s
swapon: /tmp/nixos-infect.rbzUv.swp: found swap signature: version 1d, page-size 4, same byte order
swapon: /tmp/nixos-infect.rbzUv.swp: pagesize=4096, swapsize=1073741824, devsize=1073741824

The problem lies in the fact that even with reboot command on the system does not reboot and the instance is not provided with usual NixOS commands (nix-env, nixos-rebuild, etc). I "bisected" that execution arrive to the makeConf function and just "don't execute it" (I can not find any trace of the commands in there leaving any traces on the system). The script exits with error code 1.

TheSirC avatar Jul 05 '19 16:07 TheSirC

Run set -x, before you run the script, for more detail; that should get you the exact line that fails.

elitak avatar Jul 05 '19 16:07 elitak

I actually added it to the script itself without further output. I added to the interactive session with this output :

root@address:~# ./nixos-infect
+ ./nixos-infect

And immediately returning to the interactive prompt.

TheSirC avatar Jul 05 '19 18:07 TheSirC

After further testing (and multiple reinstalls of the VPS to make sure to work on a clean system each time) I found that :

  1. the script is stopping here, on the grep part; running the command myself sends back an empty string, totally normal the file is empty but does exist ! (that is considered as a fail for grep: it sends back error code 1).
  2. Here the script does not include a refresh of the packages list (e.g. apt-get update) which can make it fail.

Fun fact : The following commands do not produce the same output and I really would like to know why :

  1. bash -x script (<-- I ran this one to have output on the script)
  2. adding set -x to the script after the shebang
  3. doing set -x in your interactive prompt and then running ./script

TheSirC avatar Jul 06 '19 07:07 TheSirC

So after applying patches for the above-cited issues I opened a pull-request that worked for me on OVH.

TheSirC avatar Jul 06 '19 08:07 TheSirC

Resolved since 3 years....

Anyway i've similar problem on different budget provider. It's kinda weird -> vps does respond to ping but seems that it have all ports closed. Any ideas what's the problem?

Anyway if i've enought time, i'll try do step by step what this script does and play a little with config. Maybe #61 is the solution?

raspher avatar Jul 31 '22 12:07 raspher