netplugin icon indicating copy to clipboard operation
netplugin copied to clipboard

mgmtfn/k8splugin: refactor to remove nsenter

Open unclejack opened this issue 8 years ago • 19 comments

This PR rewrites the network setup for kubernetes to not use nsenter any more. No changes have been made to the unit test. The code should work just like the previous code.

Errors are now passed down to the caller.

unclejack avatar Nov 11 '16 16:11 unclejack

@unclejack thanks for removing nsenter dependency. Please be sure to run system tests manually in the k8s mode (it is not part of sanity).

jojimt avatar Nov 11 '16 16:11 jojimt

@unclejack Have you been able to test k8s sanity with this? If not, can you verify simple sanity manually with vagrant? Also, can you please retrigger sanity?

jojimt avatar Jan 05 '17 19:01 jojimt

@jojimt: I haven't been able to do that so far, but I'll try again.

unclejack avatar Jan 05 '17 19:01 unclejack

Can you verify with this procedure for now: https://github.com/contiv/netplugin/tree/master/mgmtfn/k8splugin

jojimt avatar Jan 05 '17 19:01 jojimt

It seems like your latest commit did not trigger sanity. Can you please trigger it and then I can merge.

jojimt avatar Jan 06 '17 19:01 jojimt

@jojimt: Have you been able to test k8s? I didn't get a chance to do it so far. I'll push again to trigger the CI.

unclejack avatar Jan 09 '17 18:01 unclejack

No, @unclejack you need to test that. I gave you an alternative option above to perform that test.

jojimt avatar Jan 09 '17 18:01 jojimt

@unclejack, now that the k8s sanity is available, can you please run it with your changes?

jojimt avatar Feb 27 '17 06:02 jojimt

@jojimt: Sure, I'll take care of it.

unclejack avatar Feb 27 '17 09:02 unclejack

@jojimt: I'm sorry, but k8s-test is still broken:

github.com/contiv/netplugin/vendor/github.com/docker/engine-api/types
github.com/contiv/netplugin/vendor/github.com/docker/engine-api/types/reference
github.com/contiv/netplugin/vendor/github.com/docker/engine-api/types/time
github.com/contiv/netplugin/vendor/github.com/docker/engine-api/client
github.com/contiv/netplugin/netplugin/agent
github.com/contiv/netplugin/version
github.com/contiv/netplugin/netplugin
github.com/contiv/netplugin/netmaster/objApi
github.com/contiv/netplugin/netmaster/daemon
github.com/contiv/netplugin/netmaster
github.com/contiv/netplugin/vendor/github.com/codegangsta/cli
github.com/contiv/netplugin/vendor/github.com/contiv/contivmodel/client
github.com/contiv/netplugin/netctl
github.com/contiv/netplugin/netctl/netctl
github.com/contiv/netplugin/mgmtfn/k8splugin/contivk8s/clients
github.com/contiv/netplugin/mgmtfn/k8splugin/contivk8s
github.com/contiv/netplugin/mgmtfn/mesosplugin/netcontiv
Connection to 127.0.0.1 closed.
CONTIV_K8=1 cd vagrant/k8s/ && ./start_sanity_service.sh
ERROR! the playbook: ./contrib/ansible/cluster.yml could not be found
make: *** [k8s-test] Error 1

unclejack avatar Feb 27 '17 14:02 unclejack

Can you ping @abhinandanpb to determine if this is a breakage or an issue with lack of documentation on how to run it.

jojimt avatar Feb 27 '17 15:02 jojimt

#761 and #762 have been sent to fix issues with the kubernetes tests & cluster setup.

More work is needed to get to the point where the kubernetes environment works properly. I'll send some more PRs. @abhinandanpb is also working on this.

unclejack avatar Feb 28 '17 15:02 unclejack

This is currently blocked by this test failure encountered with make k8s-test:

time="Mar  2 00:14:21.598302421" level=error msg="Error making POST request: Err: 100: Key not found (/contiv.io/state/eps) [140450]\n"
time="Mar  2 00:14:21.598371822" level=error msg="Error creating ep. Err: 100: Key not found (/contiv.io/state/nets) [140450]\n"
time="Mar  2 00:14:21.598404538" level=error msg="Handler for POST /ContivCNI.AddPod returned error: 100: Key not found (/contiv.io/state/nets) [140450]\n"
==========================================

time="2017-03-02T02:14:23+02:00" level=info msg="============================= systemtestSuite.TestTriggerNetpluginUplinkUpgrade completed =========================="

----------------------------------------------------------------------
FAIL: trigger_test.go:16: systemtestSuite.TestTriggerNetpluginUplinkUpgrade

trigger_test.go:40:
    // Verify uplink state on each node
    c.Assert(node.verifyUplinkState([]string{singleUplink}), IsNil)
... value *errors.errorString = &errors.errorString{s:"Lookup failed for uplink Port eth2. Err: Process exited with: 1. Reason was:  ()"} ("Lookup failed for uplink Port eth2. Err: Process exited with: 1. Reason was:  ()")

time="2017-03-02T02:14:23+02:00" level=info msg="============================= systemtestSuite.TestTriggerNodeReload starting =========================="
time="2017-03-02T02:14:23+02:00" level=info msg="Stopping netplugin on k8node-02"
time="2017-03-02T02:14:24+02:00" level=info msg="Cleaning up slave on k8node-02"

unclejack avatar Mar 02 '17 12:03 unclejack

PR #769 makes some improvements to make these tests faster and more reliable. PR #762 addresses some other issues which cause failures in these tests..

unclejack avatar Mar 02 '17 21:03 unclejack

@jojimt The kubernetes cluster started by CONTIV_K8=1 make k8s-sanity-cluster doesn't seem healthy. Tests fail and pass at random. If the first tests have failed, the cluster needs to be shut down and started again. This was the only way I was able to fix the cluster. CPU usage and disk IO are also pretty high while not running the tests (at least 100% CPU usage for k8node-01, k8node-02 and k8node-03, ~100 MB are written to the host's disk every few seconds).

unclejack avatar Mar 03 '17 15:03 unclejack

@unclejack are you running it on a laptop? You might need to use a server instead.

jojimt avatar Mar 03 '17 16:03 jojimt

build PR

unclejack avatar Oct 30 '17 11:10 unclejack

@unclejack there's a merge conflict here that needs to be resolved first

dseevr avatar Oct 30 '17 16:10 dseevr

@dseevr: I was checking to make sure the CI is ok. This needs to wait a bit longer.

unclejack avatar Nov 03 '17 17:11 unclejack