csi-powermax icon indicating copy to clipboard operation
csi-powermax copied to clipboard

fix: add way to filter network interfaces

Open erhudy opened this issue 1 year ago • 10 comments

Description

Under certain circumstances, net.Interfaces() will begin returning interfaces in unusual orders (e.g. if the interface index goes above 255 or 256). This could cause the driver to select the wrong interface, which in turn would mean that it could not find a matching VM in vSphere and would crash. This changeset adds arguments to control what interfaces can be selected by the driver.

GitHub Issues

List the GitHub issues impacted by this PR:

GitHub Issue #

Checklist:

  • [ ] Have you run format,vet & lint checks against your submission?
  • [ ] Have you made sure that the code compiles?
  • [ ] Did you run the unit & integration tests successfully?
  • [ ] Have you maintained at least 90% code coverage?
  • [ ] Have you commented your code, particularly in hard-to-understand areas
  • [ ] Have you done corresponding changes to the documentation
  • [ ] Did you run tests in a real Kubernetes cluster?
  • [ ] Backward compatibility is not broken

How Has This Been Tested?

Built an image and deployed it in the affected Kubernetes cluster, along with changes to the Helm templates and values to surface the new environment variables so that they can be configured as needed.

erhudy avatar Mar 05 '24 16:03 erhudy

hi @erhudy , thank you for submitting the PR . We have already passed the code freeze date for CSM 1.10 release and we will consider this for CSM 1.11 .

boyamurthy avatar Mar 07 '24 05:03 boyamurthy

Sorry, this fell off the radar. The general purpose of includeFilter is to get it to select a particular interface, e.g. ens192 is the interface on our VMware VMs. That being said, I think you're right that it doesn't work exactly as intended in its present form, but it could probably be simplified to allow just specifying a particular interface name and that would avoid all the dynamic interface selection stuff entirely.

As far as testing this goes, I don't think I could run the integration test suite at that time because of some reason unrelated to the PR (it was failing on the main branch as well so probably some environment problem). The PR itself has been in use at my company since I made it, in order to filter out Calico's interfaces from selection.

erhudy avatar Apr 30 '24 16:04 erhudy

How has this been tested, please attach test logs to the PR

I simplified this PR by dropping IfaceIncludeFilter, so now it only excludes interfaces. Let me know what other changes you would like to see.

erhudy avatar May 05 '24 00:05 erhudy

How has this been tested, please attach test logs to the PR

I simplified this PR by dropping IfaceIncludeFilter, so now it only excludes interfaces. Let me know what other changes you would like to see.

hi @erhudy, could you rebase the branch with latest main.

boyamurthy avatar May 07 '24 09:05 boyamurthy

How has this been tested, please attach test logs to the PR

I simplified this PR by dropping IfaceIncludeFilter, so now it only excludes interfaces. Let me know what other changes you would like to see.

hi @erhudy, could you rebase the branch with latest main.

Rebased off main and fixed the linting error. I built a new container image from this branch after all my changes and am using it now in an internal test cluster, with the config option X_CSI_IFACE_EXCLUDE_FILTER="^cali.+" to exclude Calico interfaces.

erhudy avatar May 07 '24 13:05 erhudy

How has this been tested, please attach test logs to the PR

I simplified this PR by dropping IfaceIncludeFilter, so now it only excludes interfaces. Let me know what other changes you would like to see.

hi @erhudy, could you rebase the branch with latest main.

Rebased off main and fixed the linting error. I built a new container image from this branch after all my changes and am using it now in an internal test cluster, with the config option X_CSI_IFACE_EXCLUDE_FILTER="^cali.+" to exclude Calico interfaces.

Thank you @erhudy .

boyamurthy avatar May 08 '24 06:05 boyamurthy

@nidtara @boyamurthy : Please review the latest changes

shanmydell avatar May 14 '24 12:05 shanmydell

@erhudy , can you rebase this with the main? Thanks.

delldubey avatar Jun 25 '24 10:06 delldubey

@delldubey branch updated.

Unrelated to this branch, but if I have a crash report with an NPE, what's the best way to report that? The issues tracker on this repository is not open to me.

erhudy avatar Jun 25 '24 23:06 erhudy

@erhudy , here csm/issues

delldubey avatar Jul 12 '24 08:07 delldubey

What is required to complete this?

If somebody is waiting for me to provide unit test output, I cannot run the unit tests locally for reasons that do not appear to have anything to do with my changes:

( cd service; go clean -cache; CGO_ENABLED=0 GO111MODULE=on go test -v -coverprofile=c.out ./... )
=== RUN   TestApiRouter2
time="2024-10-14T23:18:31Z" level=info msg="starting http server on port :abc"
time="2024-10-14T23:18:31Z" level=error msg="unable to start http server to serve status requests due to listen tcp: lookup tcp/abc: unknown port"
time="2024-10-14T23:18:31Z" level=info msg="started http server to serve status requests at :abc"
--- PASS: TestApiRouter2 (0.00s)
=== RUN   TestApiRouter
time="2024-10-14T23:18:31Z" level=info msg="starting http server on port :8083"
time="2024-10-14T23:18:33Z" level=info msg="connectivityStatus called, status is <nil> \n"
time="2024-10-14T23:18:33Z" level=error msg="error probeStatus map in cache is empty"
time="2024-10-14T23:18:33Z" level=info msg="connectivityStatus called, status is &{{0 0} {[] {} 0x400061fb00} map[SymID2:0x40000801f0] 0} \n"
time="2024-10-14T23:18:33Z" level=error msg="invalid data is stored in cache"
time="2024-10-14T23:18:33Z" level=error msg="error invalid data is stored in cache during marshaling to json"
time="2024-10-14T23:18:33Z" level=info msg="connectivityStatus called, status is &{{0 0} {[] {} 0x40003d8330} map[SymID:0x40003de078] 0} \n"
time="2024-10-14T23:18:33Z" level=info msg="sending connectivityStatus for all arrays "
time="2024-10-14T23:18:33Z" level=info msg="GetArrayConnectivityStatus called for array SymIDNotPresent \n"
time="2024-10-14T23:18:33Z" level=info msg="GetArrayConnectivityStatus called for array SymID3 \n"
time="2024-10-14T23:18:33Z" level=error msg="error json: unsupported type: chan int during marshaling to json"
time="2024-10-14T23:18:33Z" level=info msg="GetArrayConnectivityStatus called for array SymID \n"
time="2024-10-14T23:18:33Z" level=info msg="sending response {LastSuccess:1728947913 LastAttempt:1728947913} for array SymID \n"
--- PASS: TestApiRouter (2.00s)
=== RUN   TestMarshalSyncMapToJSON
=== RUN   TestMarshalSyncMapToJSON/storing_valid_value_in_map_cache
=== RUN   TestMarshalSyncMapToJSON/storing_valid_value_in_map_cache#01
time="2024-10-14T23:18:33Z" level=error msg="invalid data is stored in cache"
--- PASS: TestMarshalSyncMapToJSON (0.00s)
    --- PASS: TestMarshalSyncMapToJSON/storing_valid_value_in_map_cache (0.00s)
    --- PASS: TestMarshalSyncMapToJSON/storing_valid_value_in_map_cache#01 (0.00s)
=== RUN   TestStartAPIService
time="2024-10-14T23:18:33Z" level=info msg="startNodeToArrayConnectivityCheck is running probes at pollingFrequency 30 "
time="2024-10-14T23:18:33Z" level=info msg="starting http server on port :8083"
time="2024-10-14T23:18:33Z" level=error msg="unable to start http server to serve status requests due to listen tcp :8083: bind: address already in use"
time="2024-10-14T23:18:33Z" level=info msg="started http server to serve status requests at :8083"
--- PASS: TestStartAPIService (0.00s)
=== RUN   TestStartAPIServiceNoPodmon
time="2024-10-14T23:18:33Z" level=info msg="podmon is not enabled"
--- PASS: TestStartAPIServiceNoPodmon (0.00s)
=== RUN   TestGoDog
starting godog...
Feature: PowerMax CSI interface
  As a consumer of the CSI interface
  I want to test controller publish / unpublish interfaces
  So that they are known to work
startup time: 2.097824036s
goroutines start aPowerMaxService new 16 old groutines 0
&{CSI-Test-Node-1 0 1 0 false false   iSCSI [iqn.1993-08.org.centos:01:5ae577b352a0] [] [] 0 0}
&{CSI-Test-Node-2 0 2 0 false false   iSCSI [iqn.1993-08.org.centos:01:5ae577b352a1 iqn.1993-08.org.centos:01:5ae577b352a2] [] [] 0 0}
&{CSI-Test-Node-3-FC 0 2 0 false false   Fibre [20000090fa9278dd 20000090fa9278dc] [] [] 0 0}
*****added** &{DEL-snapshot-1 0 false false 719899255 Established}*******Total Snaps on 00001 are: 1*********added** &{DEL-snapshot-2 0 false false 719916755 Established}*******Total Snaps on 00002 are: 1****time="2024-10-14T23:18:33Z" level=info msg="server url: http://127.0.0.1:40423"
time="2024-10-14T23:18:33Z" level=info msg="Couldn't read ../../gopowermax/mock/symmetrix46.json"
time="2024-10-14T23:18:33Z" level=error msg="Failed to fetch details for array: 000197900046. [Not Found]"
time="2024-10-14T23:18:33Z" level=info msg="Couldn't read ../../gopowermax/mock/symmetrix47.json"
time="2024-10-14T23:18:33Z" level=error msg="Failed to fetch details for array: 000197900047. [Not Found]"
time="2024-10-14T23:18:33Z" level=info msg="Couldn't read ../../gopowermax/mock/symmetrix13.json"
time="2024-10-14T23:18:33Z" level=error msg="Failed to fetch details for array: 000000000013. [Not Found]"
time="2024-10-14T23:18:33Z" level=error msg="None of the managed arrays specified are locally connected"
FAIL	github.com/dell/csi-powermax/v2/service	2.149s
FAIL
make: *** [Makefile:64: unit-test] Error 1

erhudy avatar Oct 14 '24 23:10 erhudy