netplugin
netplugin copied to clipboard
mgmtfn/k8splugin: refactor to remove nsenter
This PR rewrites the network setup for kubernetes to not use nsenter any more. No changes have been made to the unit test. The code should work just like the previous code.
Errors are now passed down to the caller.
@unclejack thanks for removing nsenter dependency. Please be sure to run system tests manually in the k8s mode (it is not part of sanity).
@unclejack Have you been able to test k8s sanity with this? If not, can you verify simple sanity manually with vagrant? Also, can you please retrigger sanity?
@jojimt: I haven't been able to do that so far, but I'll try again.
Can you verify with this procedure for now: https://github.com/contiv/netplugin/tree/master/mgmtfn/k8splugin
It seems like your latest commit did not trigger sanity. Can you please trigger it and then I can merge.
@jojimt: Have you been able to test k8s? I didn't get a chance to do it so far. I'll push again to trigger the CI.
No, @unclejack you need to test that. I gave you an alternative option above to perform that test.
@unclejack, now that the k8s sanity is available, can you please run it with your changes?
@jojimt: Sure, I'll take care of it.
@jojimt: I'm sorry, but k8s-test is still broken:
github.com/contiv/netplugin/vendor/github.com/docker/engine-api/types
github.com/contiv/netplugin/vendor/github.com/docker/engine-api/types/reference
github.com/contiv/netplugin/vendor/github.com/docker/engine-api/types/time
github.com/contiv/netplugin/vendor/github.com/docker/engine-api/client
github.com/contiv/netplugin/netplugin/agent
github.com/contiv/netplugin/version
github.com/contiv/netplugin/netplugin
github.com/contiv/netplugin/netmaster/objApi
github.com/contiv/netplugin/netmaster/daemon
github.com/contiv/netplugin/netmaster
github.com/contiv/netplugin/vendor/github.com/codegangsta/cli
github.com/contiv/netplugin/vendor/github.com/contiv/contivmodel/client
github.com/contiv/netplugin/netctl
github.com/contiv/netplugin/netctl/netctl
github.com/contiv/netplugin/mgmtfn/k8splugin/contivk8s/clients
github.com/contiv/netplugin/mgmtfn/k8splugin/contivk8s
github.com/contiv/netplugin/mgmtfn/mesosplugin/netcontiv
Connection to 127.0.0.1 closed.
CONTIV_K8=1 cd vagrant/k8s/ && ./start_sanity_service.sh
ERROR! the playbook: ./contrib/ansible/cluster.yml could not be found
make: *** [k8s-test] Error 1
Can you ping @abhinandanpb to determine if this is a breakage or an issue with lack of documentation on how to run it.
#761 and #762 have been sent to fix issues with the kubernetes tests & cluster setup.
More work is needed to get to the point where the kubernetes environment works properly. I'll send some more PRs. @abhinandanpb is also working on this.
This is currently blocked by this test failure encountered with make k8s-test:
time="Mar 2 00:14:21.598302421" level=error msg="Error making POST request: Err: 100: Key not found (/contiv.io/state/eps) [140450]\n"
time="Mar 2 00:14:21.598371822" level=error msg="Error creating ep. Err: 100: Key not found (/contiv.io/state/nets) [140450]\n"
time="Mar 2 00:14:21.598404538" level=error msg="Handler for POST /ContivCNI.AddPod returned error: 100: Key not found (/contiv.io/state/nets) [140450]\n"
==========================================
time="2017-03-02T02:14:23+02:00" level=info msg="============================= systemtestSuite.TestTriggerNetpluginUplinkUpgrade completed =========================="
----------------------------------------------------------------------
FAIL: trigger_test.go:16: systemtestSuite.TestTriggerNetpluginUplinkUpgrade
trigger_test.go:40:
// Verify uplink state on each node
c.Assert(node.verifyUplinkState([]string{singleUplink}), IsNil)
... value *errors.errorString = &errors.errorString{s:"Lookup failed for uplink Port eth2. Err: Process exited with: 1. Reason was: ()"} ("Lookup failed for uplink Port eth2. Err: Process exited with: 1. Reason was: ()")
time="2017-03-02T02:14:23+02:00" level=info msg="============================= systemtestSuite.TestTriggerNodeReload starting =========================="
time="2017-03-02T02:14:23+02:00" level=info msg="Stopping netplugin on k8node-02"
time="2017-03-02T02:14:24+02:00" level=info msg="Cleaning up slave on k8node-02"
PR #769 makes some improvements to make these tests faster and more reliable. PR #762 addresses some other issues which cause failures in these tests..
@jojimt The kubernetes cluster started by CONTIV_K8=1 make k8s-sanity-cluster doesn't seem healthy. Tests fail and pass at random. If the first tests have failed, the cluster needs to be shut down and started again. This was the only way I was able to fix the cluster. CPU usage and disk IO are also pretty high while not running the tests (at least 100% CPU usage for k8node-01, k8node-02 and k8node-03, ~100 MB are written to the host's disk every few seconds).
@unclejack are you running it on a laptop? You might need to use a server instead.
build PR
@unclejack there's a merge conflict here that needs to be resolved first
@dseevr: I was checking to make sure the CI is ok. This needs to wait a bit longer.