cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

server: throw exception if fail to cleanup IP resources when release a public IP

Open weizhouapache opened this issue 1 year ago • 22 comments

Description

When reproduced the issue #8967, I got the following error

2024-05-03T14:24:16,885 WARN  [c.c.n.IpAddressManagerImpl] (API-Job-Executor-44:[ctx-6da31d57, job-15369, ctx-eb716451]) (logid:d43b402b) Unable to revoke all the firewall rules for ip id=2 as a part of ip release

2024-05-03T14:24:29,282 DEBUG [c.c.n.IpAddressManagerImpl] (API-Job-Executor-44:[ctx-6da31d57, job-15369, ctx-eb716451]) (logid:d43b402b) Releasing ip id=2; sourceNat = false

2024-05-03T14:24:29,271 WARN  [c.c.n.IpAddressManagerImpl] (API-Job-Executor-44:[ctx-6da31d57, job-15369, ctx-eb716451]) (logid:d43b402b) Failed to release resources for ip address id=2

2024-05-03T14:24:36,266 WARN  [c.c.n.NetworkServiceImpl] (API-Job-Executor-44:[ctx-6da31d57, job-15369, ctx-eb716451]) (logid:d43b402b) Failed to release public ip address id=2

The errors are ignored, public IP is released successfully in cloudstack, but the IP is still associated to a VR. When associate the IP to another network, it caused an issue similar as #8967. However, the reporter of #8967 could not find any error like "Failed to release" or "Unable to revoke" in their logs, so the root cause of #8967 could be different.

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] New feature (non-breaking change which adds functionality)
  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] Enhancement (improves an existing feature and functionality)
  • [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
  • [ ] build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • [ ] Major
  • [ ] Minor

Bug Severity

  • [ ] BLOCKER
  • [ ] Critical
  • [ ] Major
  • [ ] Minor
  • [ ] Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

weizhouapache avatar May 08 '24 11:05 weizhouapache

@blueorangutan package

weizhouapache avatar May 08 '24 11:05 weizhouapache

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar May 08 '24 11:05 blueorangutan

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 14.95%. Comparing base (ea9a0f4) to head (8d0aab5). Report is 199 commits behind head on 4.19.

Files Patch % Lines
...n/java/com/cloud/network/IpAddressManagerImpl.java 0.00% 5 Missing :warning:
...n/java/com/cloud/network/dao/IPAddressDaoImpl.java 0.00% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               4.19    #9059      +/-   ##
============================================
- Coverage     14.96%   14.95%   -0.01%     
- Complexity    10995    11017      +22     
============================================
  Files          5373     5382       +9     
  Lines        469024   470133    +1109     
  Branches      58818    59924    +1106     
============================================
+ Hits          70197    70320     +123     
- Misses       391056   392024     +968     
- Partials       7771     7789      +18     
Flag Coverage Δ
uitests 4.28% <ø> (-0.04%) :arrow_down:
unittests 15.66% <0.00%> (-0.01%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar May 08 '24 11:05 codecov-commenter

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9576

blueorangutan avatar May 08 '24 12:05 blueorangutan

@blueorangutan test matrix

weizhouapache avatar May 08 '24 14:05 weizhouapache

@weizhouapache a [SL] Trillian-Jenkins matrix job (centos7 mgmt + xenserver71, rocky8 mgmt + vmware67u3, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests

blueorangutan avatar May 08 '24 14:05 blueorangutan

[SF] Trillian test result (tid-10194) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 43853 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9059-t10194-kvm-centos7.zip Smoke tests completed. 130 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_events_resource Error 420.30 test_events_resource.py

blueorangutan avatar May 09 '24 03:05 blueorangutan

[SF] Trillian test result (tid-10192) Environment: xenserver-71 (x2), Advanced Networking with Mgmt server 7 Total time taken: 47085 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9059-t10192-xenserver-71.zip Smoke tests completed. 130 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_events_resource Error 336.85 test_events_resource.py

blueorangutan avatar May 09 '24 04:05 blueorangutan

[SF] Trillian test result (tid-10193) Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server r8 Total time taken: 50852 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9059-t10193-vmware-67u3.zip Smoke tests completed. 128 look OK, 3 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_events_resource Error 351.57 test_events_resource.py
test_create_pvlan_network Error 0.09 test_pvlan.py
test_02_trigger_shutdown Failure 341.73 test_safe_shutdown.py

blueorangutan avatar May 09 '24 05:05 blueorangutan

@blueorangutan package

sureshanaparti avatar Jun 24 '24 07:06 sureshanaparti

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Jun 24 '24 07:06 blueorangutan

@blueorangutan package

weizhouapache avatar Jun 24 '24 07:06 weizhouapache

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Jun 24 '24 07:06 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10083

blueorangutan avatar Jun 24 '24 08:06 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10085

blueorangutan avatar Jun 24 '24 08:06 blueorangutan

@blueorangutan test

sureshanaparti avatar Jun 26 '24 08:06 sureshanaparti

@sureshanaparti a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Jun 26 '24 08:06 blueorangutan

[SF] Trillian test result (tid-10634) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 41982 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9059-t10634-kvm-centos7.zip Smoke tests completed. 131 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File

blueorangutan avatar Jun 26 '24 20:06 blueorangutan

@blueorangutan package

sureshanaparti avatar Jun 28 '24 11:06 sureshanaparti

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Jun 28 '24 11:06 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10195

blueorangutan avatar Jun 28 '24 13:06 blueorangutan

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

github-actions[bot] avatar Jun 29 '24 05:06 github-actions[bot]

@weizhouapache

I am not able to reproduce the behaviour, do you have any other specific steps

I have followed these steps

Create 2 sessions and execute the following API at the same time

  1. Execute disassociate IP address
  2. Execute creates a firewall rule on the same IP

Did not observe the log

egrep 'Failed to release | Unable to revoke ' /var/log/cloudstack/management/management-server.log

The create firewall API returns Error: (HTTP 431, error code 4350) Unable to create firewall rule for the IP address ID=5 as IP is not associated with any network and no networkId is passed in

Also, I see the IPAddress present in the router even after the disassociate IP address is successfull

cat /etc/cloudstack/ips.json

}root@r-4-VM:~# cat /etc/cloudstack/ips.json
{
  "eth0": [
    {
      "add": true,
      "broadcast": "10.1.1.255",
      "cidr": "10.1.1.1/24",
      "device": "eth0",
      "gateway": "",
      "netmask": "255.255.255.0",
      "network": "10.1.1.0/24",
      "nic_dev_id": "0",
      "nw_type": "guest",
      "one_to_one_nat": false,
      "public_ip": "10.1.1.1",
      "size": "24",
      "source_nat": false
    }
  ],
  "eth1": [
    {
      "add": true,
      "broadcast": "169.254.255.255",
      "cidr": "169.254.216.24/16",
      "device": "eth1",
      "gateway": "",
      "netmask": "255.255.0.0",
      "network": "169.254.0.0/16",
      "nic_dev_id": "1",
      "nw_type": "control",
      "one_to_one_nat": false,
      "public_ip": "169.254.216.24",
      "size": "16",
      "source_nat": false
    }
  ],
  "eth2": [
    {
      "add": true,
      "broadcast": "10.0.63.255",
      "cidr": "10.0.54.123/20",
      "device": "eth2",
      "first_i_p": true,
      "gateway": "10.0.48.1",
      "is_private_gateway": false,
      "mtu": "1500",
      "netmask": "255.255.240.0",
      "network": "10.0.48.0/20",
      "new_nic": false,
      "nic_dev_id": 2,
      "nw_type": "public",
      "one_to_one_nat": false,
      "public_ip": "10.0.54.123",
      "size": "20",
      "source_nat": true,
      "vif_mac_address": "1e:00:89:00:00:03"
    },
    {
      "add": false,
      "broadcast": "10.0.63.255",
      "cidr": "10.0.54.124/20",
      "device": "eth2",
      "first_i_p": false,
      "gateway": "10.0.48.1",
      "is_private_gateway": false,
      "mtu": "1500",
      "netmask": "255.255.240.0",
      "network": "10.0.48.0/20",
      "new_nic": false,
      "nic_dev_id": 2,
      "nw_type": "public",
      "one_to_one_nat": false,
      "public_ip": "10.0.54.124",
      "size": "20",
      "source_nat": false,
      "vif_mac_address": "1e:00:89:00:00:03"
    }
  ],
  "id": "ips"

kiranchavala avatar Aug 21 '24 08:08 kiranchavala

@weizhouapache

I am not able to reproduce the behaviour, do you have any other specific steps

I have followed these steps

Create 2 sessions and execute the following API at the same time

  1. Execute disassociate IP address
  2. Execute creates a firewall rule on the same IP

Did not observe the log

egrep 'Failed to release | Unable to revoke ' /var/log/cloudstack/management/management-server.log

The create firewall API returns Error: (HTTP 431, error code 4350) Unable to create firewall rule for the IP address ID=5 as IP is not associated with any network and no networkId is passed in

Also, I see the IPAddress present in the router even after the disassociate IP address is successfull

cat /etc/cloudstack/ips.json

}root@r-4-VM:~# cat /etc/cloudstack/ips.json
{
  "eth0": [
    {
      "add": true,
      "broadcast": "10.1.1.255",
      "cidr": "10.1.1.1/24",
      "device": "eth0",
      "gateway": "",
      "netmask": "255.255.255.0",
      "network": "10.1.1.0/24",
      "nic_dev_id": "0",
      "nw_type": "guest",
      "one_to_one_nat": false,
      "public_ip": "10.1.1.1",
      "size": "24",
      "source_nat": false
    }
  ],
  "eth1": [
    {
      "add": true,
      "broadcast": "169.254.255.255",
      "cidr": "169.254.216.24/16",
      "device": "eth1",
      "gateway": "",
      "netmask": "255.255.0.0",
      "network": "169.254.0.0/16",
      "nic_dev_id": "1",
      "nw_type": "control",
      "one_to_one_nat": false,
      "public_ip": "169.254.216.24",
      "size": "16",
      "source_nat": false
    }
  ],
  "eth2": [
    {
      "add": true,
      "broadcast": "10.0.63.255",
      "cidr": "10.0.54.123/20",
      "device": "eth2",
      "first_i_p": true,
      "gateway": "10.0.48.1",
      "is_private_gateway": false,
      "mtu": "1500",
      "netmask": "255.255.240.0",
      "network": "10.0.48.0/20",
      "new_nic": false,
      "nic_dev_id": 2,
      "nw_type": "public",
      "one_to_one_nat": false,
      "public_ip": "10.0.54.123",
      "size": "20",
      "source_nat": true,
      "vif_mac_address": "1e:00:89:00:00:03"
    },
    {
      "add": false,
      "broadcast": "10.0.63.255",
      "cidr": "10.0.54.124/20",
      "device": "eth2",
      "first_i_p": false,
      "gateway": "10.0.48.1",
      "is_private_gateway": false,
      "mtu": "1500",
      "netmask": "255.255.240.0",
      "network": "10.0.48.0/20",
      "new_nic": false,
      "nic_dev_id": 2,
      "nw_type": "public",
      "one_to_one_nat": false,
      "public_ip": "10.0.54.124",
      "size": "20",
      "source_nat": false,
      "vif_mac_address": "1e:00:89:00:00:03"
    }
  ],
  "id": "ips"

@kiranchavala Thanks This might has been fixed by #9234

weizhouapache avatar Aug 21 '24 08:08 weizhouapache

Closing Reopen if needed

weizhouapache avatar Aug 21 '24 08:08 weizhouapache