cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

IP address conflicts held by public IPs of virtual routers

Open vdombrovski opened this issue 10 months ago • 34 comments

ISSUE TYPE
  • Bug Report
COMPONENT NAME
VR
CLOUDSTACK VERSION
4.17.2

Note: I believe this impacts releases 4.18 and 4.19 also as no relevant code changes have been made to VR code

CONFIGURATION
  • Advanced zone
  • Standard networking using virtual routers only
  • /24 pool of "public" IPs available for the virtual routers
OS / ENVIRONMENT

Ubuntu Focal 20.04

SUMMARY

Our workflow makes it so public IPs can be assigned/unassigned/reassigned to different networks, possibly multiple times per day (mostly, but not only via https://github.com/apache/cloudstack-kubernetes-provider)

After a while, we started seeing that many unused IP addresses (according to ACS database) were actually answering to L2 ARPing. In parallel, some of our IPs are getting 2 different MACs from the ping:

ARPING XX.YY.25.156
60 bytes from 1e:00:fe:00:04:ac (XX.YY.25.156): index=0 time=306.714 usec
60 bytes from 1e:00:63:00:04:b5 (XX.YY.25.156): index=1 time=342.809 usec

Our investigation has shown these conflicts on around 10 ips in two of our /24 subnets, which looks too much to be a conicidence.

We have confirmed that both MAC addresses belong to alive and healthy virtual routers:

 [cloud]> select instance_id from nics where mac_address="1e:00:63:00:04:b5";
+-------------+
| instance_id |
+-------------+
|        5654 |
+-------------+
1 row in set (0.010 sec)

[cloud]> select instance_id from nics where mac_address="1e:00:fe:00:04:ac";
+-------------+
| instance_id |
+-------------+
|        5624 |
+-------------+
1 row in set (0.007 sec)

These routers both correctly mount the IP addresses:

ip a | grep XX.YY.25.156
    inet XX.YY.25.156/24 brd XX.YY.25.255 scope global secondary eth2

Inside the "databag" of the so called "illegitimate" router causing the conflict, we can see the IP declared (sometimes with flag add: true, sometimes with add: false):

# /etc/cloudstack/ips.json
    {
      "add": true,
      "broadcast": "XX.YY.25.255",
      "cidr": "XX.YY.25.156/24",
      "device": "eth2",
      "first_i_p": false,
      "gateway": "XX.YY.25.1",
      "is_private_gateway": false,
      "netmask": "255.255.255.0",
      "network": "XX.YY.25.0/24",
      "new_nic": false,
      "nic_dev_id": 2,
      "nw_type": "public",
      "one_to_one_nat": false,
      "public_ip": "XX.YY.25.156",
      "size": "24",
      "source_nat": false,
      "vif_mac_address": "1e:00:63:00:04:b5"
    }

To my understanding, this file is read/merged whenever a new IP state change is requested; however it's not merged properly on unassignment.

Obviously, deleting/recreating all the virtual routers would solve the issue as this cache would be cleared; however because this causes massive disruptions on the service we would like to make sure the root cause is fixed first.

STEPS TO REPRODUCE

No specific command line here, but I think trying to bounce a public IP between 2 guest routers (ofc inside separate VPCs if these are VPC networks) several time would results in this issue.

  1. Create 2 guest networks A and B
  2. Create 2 VMS, one inside network A, the other inside network B
  3. Associate an IP address to network A, and create a loadbalancer rule for some port of VM A (should also work with port-forwarding and static NAT)
  4. Release the IP, associate it to network B, and create a loadbalancer rule for some port of VM A (should also work with port-forwarding and static NAT)
  5. Repeat steps 3-4 until you get an IP address conflict (or until you see invalid entries in the databag)
EXPECTED RESULTS

The IP is properly garbage collected on the router that doesn't hold it anymore

ACTUAL RESULTS

IP address conflcit

vdombrovski avatar Apr 24 '24 08:04 vdombrovski

thanks @vdombrovski , this will (obviously) be hard to reproduce in a lab environment. the true/false flag of the databag should ensure that IPs are removed, so it not being merged properly on unassigment is the issue as you already stated.

Do you have further logging from the VR that would help identifying the culprit?

(I'll try it in a lab env, but have little hope it will be reliably reproducible)

DaanHoogland avatar Apr 24 '24 10:04 DaanHoogland

@DaanHoogland not sure if this helps, in /var/log/cloud.log I'm getting the same log repeated for all IPs:

2024-04-04 04:43:29,474 INFO     Add: rule=-I VPN_XX.YY.25.156 -m state --state RELATED,ESTABLISHED -j ACCEPT table=mangle
2024-04-04 04:43:29,475 INFO     Executing: iptables -t mangle -I VPN_XX.YY.25.156 -m state --state RELATED,ESTABLISHED -j ACCEPT
2024-04-04 04:43:30,158 INFO     Executing: iptables -t mangle -D VPN_XX.YY.25.156 -m state --state RELATED,ESTABLISHED -j ACCEPT
2024-04-04 04:43:30,162 INFO     Delete rule -D VPN_XX.YY.25.156 -m state --state RELATED,ESTABLISHED -j ACCEPT from table mangle
2024-04-04 04:43:36,448 INFO     Address found in DataBag ==> {u'public_ip': u'XX.YY.25.156', u'one_to_one_nat': False, u'nic_dev_id': 2, u'network': u'XX.YY.25.0/24', u'first_i_p': False, u'netmask': u'255.255.255.0', u'source_nat': False, u'broadcast': u'XX.YY.25.255', u'add': True, u'is_private_gateway': False, u'nw_type': u'public', u'device': u'eth2', u'vif_mac_address': u'1e:00:63:00:04:b5', u'cidr': u'XX.YY.25.156/24', u'gateway': u'XX.YY.25.1', u'new_nic': False, u'size': u'24'}
2024-04-04 04:43:36,448 INFO     Address XX.YY.25.156/24 on device eth2 already configured
2024-04-04 04:43:36,461 INFO     Executing: arping -c 1 -I eth2 -A -U -s XX.YY.25.156 XX.YY.25.1

Regarding DataBag logs specifically (again in /var/log/cloud.log), on some instances we are getting this WARNING, without any other explanation:

2024-04-23 13:11:27,089 WARNING  Unable to find and process databag for file: ip_associations.json.a81b47b8-2a91-42a2-a529-1de42849049c, for json type=ip_associations

I will setup a "brute-force" type test case that mirrors what I've described in "Steps to reproduce" using either CMK or Terraform; I'll let you know if I can get it to reproduce.

vdombrovski avatar Apr 24 '24 13:04 vdombrovski

oh, that,

2024-04-23 13:11:27,089 WARNING  Unable to find and process databag for file: ip_associations.json.a81b47b8-2a91-42a2-a529-1de42849049c, for json type=ip_associations

, sounds like a weird timing issue, as if two merge deamons are tryimg to process the file at the same time and one has already moved it...???

and it seems here,

2024-04-04 04:43:30,162 INFO     Delete rule -D VPN_XX.YY.25.156 -m state --state RELATED,ESTABLISHED -j ACCEPT from table mangle
2024-04-04 04:43:36,448 INFO     Address found in DataBag ==> {u'public_ip': u'XX.YY.25.156', u'one_to_one_nat': False, u'nic_dev_id': 2, u'network': u'XX.YY.25.0/24', u'first_i_p': False, u'netmask': u'255.255.255.0', u'source_nat': False, u'broadcast': u'XX.YY.25.255', u'add': True, u'is_private_gateway': False, u'nw_type': u'public', u'device': u'eth2', u'vif_mac_address': u'1e:00:63:00:04:b5', u'cidr': u'XX.YY.25.156/24', u'gateway': u'XX.YY.25.1', u'new_nic': False, u'size': u'24'}

, that almost 0.3 second the after the deletion it is added again.

DaanHoogland avatar Apr 24 '24 13:04 DaanHoogland

did a quick test, everything looks good this may need intensive testing to reproduce the issue

weizhouapache avatar Apr 24 '24 15:04 weizhouapache

@DaanHoogland here is how I successfully reproduced the issue on our QA platform:

56 bytes from 1e:00:4d:00:00:c1 (XX.YY.17.30): index=959 time=275.810 usec
56 bytes from 1e:00:6b:00:00:c2 (XX.YY.17.30): index=960 time=337.038 usec

You were right, this is a timing issue. Here is the raw cmk code to reproduce:

  • Create 2 isolated networks, network-a and network-b (mine are ids c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0 and 7e2aa0f1-c536-4162-be43-aea277ae80aa
  • You don't need to add any VMs, only playing with virtual routers is enough to cause the issue
  • Target a specific IP address, in my case XX.YY.17.30 (id=f1988f51-c9e5-46e4-9393-2a873b2610e0)
  • Create 2 scripts as follows (make sure cmk works properly on your machine, I'm using 6.2.0)
  • Run both scripts in parallel in a while true loop as follows:

Terminal 1:

while true; do bash ./test1.sh; done

Terminal 2:

while true; do bash ./test2.sh; done
  • Test with arping on the IP (arping XX.YY.17.30), it should take around 2-3 mins on a platform with 0 load.
#test1.sh

cmk associate ipaddress projectid=05019f40-8552-4856-9cf1-02ae1b7e2621 zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7 ipaddress=XX.YY.17.30 account=qa networkid=c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0 # Network A

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=22 privateport=22 name=vm-a

cmk disassociate ipaddress id=f1988f51-c9e5-46e4-9393-2a873b2610e0

cmk associate ipaddress projectid=05019f40-8552-4856-9cf1-02ae1b7e2621 zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7 ipaddress=XX.YY.17.30 account=qa networkid=7e2aa0f1-c536-4162-be43-aea277ae80aa # network B

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=22 privateport=22 name=vm-b

cmk disassociate ipaddress id=f1988f51-c9e5-46e4-9393-2a873b2610e0
#test2.sh

cmk associate ipaddress projectid=05019f40-8552-4856-9cf1-02ae1b7e2621 zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7 ipaddress=XX.YY.17.30 account=qa networkid=c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0 # Network A

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=23 privateport=23 name=vm-a

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=24 privateport=24 name=vm-a

cmk disassociate ipaddress id=f1988f51-c9e5-46e4-9393-2a873b2610e0

cmk associate ipaddress projectid=05019f40-8552-4856-9cf1-02ae1b7e2621 zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7 ipaddress=XX.YY.17.30 account=qa networkid=7e2aa0f1-c536-4162-be43-aea277ae80aa # network B

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=23 privateport=23 name=vm-b

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=24 privateport=24 name=vm-b

cmk disassociate ipaddress id=f1988f51-c9e5-46e4-9393-2a873b2610e0

Now I understand that handling concurrency isn't really a priority for this project (see another issue from last year: https://github.com/apache/cloudstack/issues/7907)

However, with the introduction of more vectors of automation (Terraform, Kubernetes), I believe that these types of issues might become really common. In our case, the IP conflicts have impacted more than 10 IPs total on a pool of 400 IPs, which is 2.5%.

vdombrovski avatar Apr 24 '24 16:04 vdombrovski

Hi

We need that to be fix, how can we help ?

Best

bragonznx avatar Apr 24 '24 16:04 bragonznx

Don't be putting the project down like that, concurrency is one of the main pillars of the project. We have not always been too good at it (as a project), admitted. Manpower vs priorities is the issue here, i think.

Thanks for the investigation @vdombrovski ! Can you do a test on 4.19 , or preferably main as well ?

@bragonznx , code investigation, design, PR, testing (big demand on that) are all needed. What can you offer ?

cheers guys.

DaanHoogland avatar Apr 25 '24 07:04 DaanHoogland

@DaanHoogland as I said, this doesn't seem to be a priority at the current time, even though it might be one of the main pillars of the project. I'm not judging here, I'm simply stating the facts based on the change logs. I also know that people working on this project have put a lot of effort recently into enabling alternative L3-L7 solutions that might be more robust than the virtual router (such as the support of VNF appliance and Tungsten Fabric's successor), which might in term enable production-grade uses without the use of proprietary components.

Regarding testing the new release, I am not able to perform these tests on other versions as we don't have a QA platform in those releases (we have put updates on hold following all the migration issues that were reported by people on the users mailing list). We are currently waiting for the 4.18.2.0 to be released as our next upgrade candidate. Maybe is there a particular commit / diff / change from the tree that catches your eye that would help in solving this issue?

In the meantime, what could prevent this bug from occurring in a lab environment? Maybe this is something that a concurrent test case might catch?

vdombrovski avatar Apr 25 '24 10:04 vdombrovski

We are currently waiting for the 4.18.2.0 to be released as our next upgrade candidate.

it has been released ;)

As for how to prevent it, I have no ready answer. When creating the situation at fault, does cloudstack in its DB have any proof of the double assignment?

  • We can make sure that VRs are being convergently reconfigured when/if the situation is found.
  • Another solution would be a system wide lock, which is a performance hog and I would advise against.

DaanHoogland avatar Apr 25 '24 12:04 DaanHoogland

@DaanHoogland yes correct, released after I wrote my comment, which means we will surely plan our tests now.

As for proof inside the DB, I've ran more tests, and I can most definitely introduce inconsistencies. For example, my last test got an IP with allocated=NULL but state=Allocated, which in practice means this ip is now unusable until I manually reset it in the database (it cannot be assigned, it cannot be released).

select * from user_ip_address where public_ip_address="XX.YY.17.30" \G;
*************************** 1. row ***************************
                 id: 107
               uuid: f1988f51-c9e5-46e4-9393-2a873b2610e0
         account_id: NULL
          domain_id: NULL
  public_ip_address: XX.YY.17.30
     data_center_id: 1
         source_nat: 0
          allocated: NULL
         vlan_db_id: 2
     one_to_one_nat: 0
              vm_id: NULL
              state: Allocated
        mac_address: 139
  source_network_id: 200
         network_id: 225
physical_network_id: 200
          is_system: 0
             vpc_id: NULL
          dnat_vmip: NULL
        is_portable: 0
            display: 1
            removed: NULL
            created: NULL
         rule_state: NULL
       forsystemvms: 0

Note that all these cases are introduced without any modifications to my original test; I'm simply changing the timings of when I start test1.sh and test2.sh

I'll run some more tests, and report any additional anomalies I find.

vdombrovski avatar Apr 26 '24 17:04 vdombrovski

Hello

Do we have any news on this ? I believe it's too important breaking production, to wait for a sponsor to cover this patch

Best regards

bragonznx avatar May 02 '24 08:05 bragonznx

cc @weizhouapache @sureshanaparti to triage

rohityadavcloud avatar May 02 '24 09:05 rohityadavcloud

@vdombrovski @bragonznx have you tried to reproduce this issue against CloudStack 4.18.2.0 or 4.19.0.1 ? Anyone is welcome to propose a fix.

rohityadavcloud avatar May 02 '24 09:05 rohityadavcloud

@DaanHoogland here is how I successfully reproduced the issue on our QA platform:

56 bytes from 1e:00:4d:00:00:c1 (XX.YY.17.30): index=959 time=275.810 usec
56 bytes from 1e:00:6b:00:00:c2 (XX.YY.17.30): index=960 time=337.038 usec

You were right, this is a timing issue. Here is the raw cmk code to reproduce:

  • Create 2 isolated networks, network-a and network-b (mine are ids c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0 and 7e2aa0f1-c536-4162-be43-aea277ae80aa
  • You don't need to add any VMs, only playing with virtual routers is enough to cause the issue
  • Target a specific IP address, in my case XX.YY.17.30 (id=f1988f51-c9e5-46e4-9393-2a873b2610e0)
  • Create 2 scripts as follows (make sure cmk works properly on your machine, I'm using 6.2.0)
  • Run both scripts in parallel in a while true loop as follows:

Terminal 1:

while true; do bash ./test1.sh; done

Terminal 2:

while true; do bash ./test2.sh; done
  • Test with arping on the IP (arping XX.YY.17.30), it should take around 2-3 mins on a platform with 0 load.
#test1.sh

cmk associate ipaddress projectid=05019f40-8552-4856-9cf1-02ae1b7e2621 zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7 ipaddress=XX.YY.17.30 account=qa networkid=c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0 # Network A

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=22 privateport=22 name=vm-a

cmk disassociate ipaddress id=f1988f51-c9e5-46e4-9393-2a873b2610e0

cmk associate ipaddress projectid=05019f40-8552-4856-9cf1-02ae1b7e2621 zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7 ipaddress=XX.YY.17.30 account=qa networkid=7e2aa0f1-c536-4162-be43-aea277ae80aa # network B

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=22 privateport=22 name=vm-b

cmk disassociate ipaddress id=f1988f51-c9e5-46e4-9393-2a873b2610e0
#test2.sh

cmk associate ipaddress projectid=05019f40-8552-4856-9cf1-02ae1b7e2621 zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7 ipaddress=XX.YY.17.30 account=qa networkid=c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0 # Network A

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=23 privateport=23 name=vm-a

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=24 privateport=24 name=vm-a

cmk disassociate ipaddress id=f1988f51-c9e5-46e4-9393-2a873b2610e0

cmk associate ipaddress projectid=05019f40-8552-4856-9cf1-02ae1b7e2621 zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7 ipaddress=XX.YY.17.30 account=qa networkid=7e2aa0f1-c536-4162-be43-aea277ae80aa # network B

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=23 privateport=23 name=vm-b

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=24 privateport=24 name=vm-b

cmk disassociate ipaddress id=f1988f51-c9e5-46e4-9393-2a873b2610e0

Now I understand that handling concurrency isn't really a priority for this project (see another issue from last year: #7907)

However, with the introduction of more vectors of automation (Terraform, Kubernetes), I believe that these types of issues might become really common. In our case, the IP conflicts have impacted more than 10 IPs total on a pool of 400 IPs, which is 2.5%.

thanks @vdombrovski
I adapted the two scripts you shared, and run them in parallel for around 2 hours with a public IP, two isolated networks, two vms I did not face any issue. I tested with latest main branch (4.20-SNAPSHOT), not 4.17.2

Can you share the full management-server.log for further investigation ?

weizhouapache avatar May 02 '24 12:05 weizhouapache

@weizhouapache because it's a timing issue, you need to make sure that both scripts run with a delay, otherwise you are getting a favorable case where both operations target network A, then both target network B etc..

Try modifying test2 to first target network B, and add a random sleep after disassociating the IP; Both of these changes greatly help with reproducting the issue (I could reproduce again now in < 1min), like this (leave test1 the exact same):

#test2.sh

cmk associate ipaddress projectid=05019f40-8552-4856-9cf1-02ae1b7e2621 zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7 ipaddress=XX.YY.17.30 account=qa networkid=7e2aa0f1-c536-4162-be43-aea277ae80aa # network B

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=23 privateport=23 name=vm-b

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=24 privateport=24 name=vm-b

cmk disassociate ipaddress id=f1988f51-c9e5-46e4-9393-2a873b2610e0

sleep $((RANDOM % 10))

cmk associate ipaddress projectid=05019f40-8552-4856-9cf1-02ae1b7e2621 zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7 ipaddress=XX.YY.17.30 account=qa networkid=c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0 # Network A

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=23 privateport=23 name=vm-a

cmk createLoadBalancerRule algorithm=roundrobin publicipid=f1988f51-c9e5-46e4-9393-2a873b2610e0 publicport=24 privateport=24 name=vm-a

cmk disassociate ipaddress id=f1988f51-c9e5-46e4-9393-2a873b2610e0

sleep $((RANDOM % 10))

As requested, here are the relevant logs from the manager, filtered for assignment+release of IP addresses, at the exact time the conflict occurs:

2024-05-02 14:32:41,747 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6674:ctx-b0b01e22) (logid:ace12149) ===START===  X.Y.191.80 -- GET  account=test-admin&apiKey=REDACTED&command=associateIpAddress&ipaddress=XX.YY.17.30&networkid=c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0&projectid=05019f40-8552-4856-9cf1-02ae1b7e2621&response=json&zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7&signature=pcavcaveiIHGjw%2FlWwFeH3mX63Q%3D
2024-05-02 14:32:41,805 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6674:ctx-b0b01e22 ctx-083bb8ea ctx-1ad4ec66) (logid:ace12149) ===END===  X.Y.191.80 -- GET  account=test-admin&apiKey=REDACTED&command=associateIpAddress&ipaddress=XX.YY.17.30&networkid=c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0&projectid=05019f40-8552-4856-9cf1-02ae1b7e2621&response=json&zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7&signature=pcavcaveiIHGjw%2FlWwFeH3mX63Q%3D
2024-05-02 14:32:41,837 DEBUG [c.c.n.IpAddressManagerImpl] (API-Job-Executor-30:ctx-4262738c job-2777 ctx-cd5a7ce3) (logid:eb86be5e) Successfully associated ip address XX.YY.17.30 to network Network {"id": 225, "name": "network-A", "uuid": "c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0", "networkofferingid": 9}
2024-05-02 14:32:41,850 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-30:ctx-4262738c job-2777 ctx-cd5a7ce3) (logid:eb86be5e) Complete async job-2777, jobStatus: SUCCEEDED, resultCode: 0, result: org.apache.cloudstack.api.response.IPAddressResponse/ipaddress/{"id":"f1988f51-c9e5-46e4-9393-2a873b2610e0","ipaddress":"XX.YY.17.30","allocated":"2024-05-02T14:32:41+0200","zoneid":"3adfd657-7e4e-4613-99da-49b3a0ef9db7","zonename":"ZONE","issourcenat":"false","projectid":"05019f40-8552-4856-9cf1-02ae1b7e2621","project":"testcpu","domainid":"e8858164-61cb-47ae-b05e-21163381e5ff","domain":"RND","forvirtualnetwork":"true","vlanid":"5c4f2de6-4555-4f8c-8af2-f2fd325bf101","vlanname":"vlan://untagged","isstaticnat":"false","issystem":"false","associatednetworkid":"c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0","associatednetworkname":"network-A","networkid":"08fb8c61-c810-487b-b931-1911a72a0f3e","state":"Allocated","physicalnetworkid":"cc445c30-6484-47e5-b13b-ffdd6554c02b","tags":[],"isportable":"false","fordisplay":"true","hasannotations":"false"}
2024-05-02 14:32:42,902 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6814:ctx-826cfc0d) (logid:08064c3d) ===START===  X.Y.191.80 -- GET  account=test-admin&apiKey=REDACTED&command=associateIpAddress&ipaddress=XX.YY.17.30&networkid=c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0&projectid=05019f40-8552-4856-9cf1-02ae1b7e2621&response=json&zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7&signature=pcavcaveiIHGjw%2FlWwFeH3mX63Q%3D
2024-05-02 14:32:42,932 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6814:ctx-826cfc0d ctx-96b1e3fd ctx-d8a477c6) (logid:08064c3d) ===END===  X.Y.191.80 -- GET  account=test-admin&apiKey=REDACTED&command=associateIpAddress&ipaddress=XX.YY.17.30&networkid=c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0&projectid=05019f40-8552-4856-9cf1-02ae1b7e2621&response=json&zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7&signature=pcavcaveiIHGjw%2FlWwFeH3mX63Q%3D
2024-05-02 14:32:48,278 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6705:ctx-3cafb699) (logid:b65f7570) ===START===  X.Y.191.80 -- GET  apiKey=REDACTED&command=disassociateIpAddress&id=f1988f51-c9e5-46e4-9393-2a873b2610e0&response=json&signature=02Sf59NUDKw%2FKg%2Fju1gg%2B1229mQ%3D
2024-05-02 14:32:48,301 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (qtp1750498848-6705:ctx-3cafb699 ctx-59398c9e ctx-8f5a2116) (logid:b65f7570) submit async job-2780, details: AsyncJobVO: {id:2780, userId: 2, accountId: 2, instanceType: IpAddress, instanceId: 107, cmd: org.apache.cloudstack.api.command.user.address.DisassociateIPAddrCmd, cmdInfo: {"apiKey":"REDACTED","signature":"02Sf59NUDKw/Kg/ju1gg+1229mQ\u003d","response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"70154","id":"f1988f51-c9e5-46e4-9393-2a873b2610e0","ctxDetails":"{\"interface com.cloud.network.IpAddress\":\"f1988f51-c9e5-46e4-9393-2a873b2610e0\"}","ctxAccountId":"2","uuid":"f1988f51-c9e5-46e4-9393-2a873b2610e0","cmdEventType":"NET.IPRELEASE"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 198690281380792, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2024-05-02 14:32:48,301 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-33:ctx-7c0ac951 job-2780) (logid:94060328) Executing AsyncJobVO: {id:2780, userId: 2, accountId: 2, instanceType: IpAddress, instanceId: 107, cmd: org.apache.cloudstack.api.command.user.address.DisassociateIPAddrCmd, cmdInfo: {"apiKey":"REDACTED","signature":"02Sf59NUDKw/Kg/ju1gg+1229mQ\u003d","response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"70154","id":"f1988f51-c9e5-46e4-9393-2a873b2610e0","ctxDetails":"{\"interface com.cloud.network.IpAddress\":\"f1988f51-c9e5-
46e4-9393-2a873b2610e0\"}","ctxAccountId":"2","uuid":"f1988f51-c9e5-46e4-9393-2a873b2610e0","cmdEventType":"NET.IPRELEASE"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 198690281380792, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2024-05-02 14:32:48,302 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6705:ctx-3cafb699 ctx-59398c9e ctx-8f5a2116) (logid:b65f7570) ===END===  X.Y.191.80 -- GET  apiKey=REDACTED&command=disassociateIpAddress&id=f1988f51-c9e5-46e4-9393-2a873b2610e0&response=json&signature=02Sf59NUDKw%2FKg%2Fju1gg%2B1229mQ%3D
2024-05-02 14:32:51,347 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6807:ctx-ac7656e2) (logid:ae5760e6) ===START===  X.Y.191.80 -- GET  apiKey=REDACTED&command=disassociateIpAddress&id=f1988f51-c9e5-46e4-9393-2a873b2610e0&response=json&signature=02Sf59NUDKw%2FKg%2Fju1gg%2B1229mQ%3D
2024-05-02 14:32:51,362 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (qtp1750498848-6807:ctx-ac7656e2 ctx-07d762b2 ctx-c606fd19) (logid:ae5760e6) submit async job-2782, details: AsyncJobVO: {id:2782, userId: 2, accountId: 2, instanceType: IpAddress, instanceId: 107, cmd: org.apache.cloudstack.api.command.user.address.DisassociateIPAddrCmd, cmdInfo: {"apiKey":"REDACTED","signature":"02Sf59NUDKw/Kg/ju1gg+1229mQ\u003d","response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"70161","id":"f1988f51-c9e5-46e4-9393-2a873b2610e0","ctxDetails":"{\"interface com.cloud.network.IpAddress\":\"f1988f51-c9e5-46e4-9393-2a873b2610e0\"}","ctxAccountId":"2","uuid":"f1988f51-c9e5-46e4-9393-2a873b2610e0","cmdEventType":"NET.IPRELEASE"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 198690281380792, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2024-05-02 14:32:51,362 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-35:ctx-1248a5f3 job-2782) (logid:bb1f9468) Executing AsyncJobVO: {id:2782, userId: 2, accountId: 2, instanceType: IpAddress, instanceId: 107, cmd: org.apache.cloudstack.api.command.user.address.DisassociateIPAddrCmd, cmdInfo: {"apiKey":"REDACTED","signature":"02Sf59NUDKw/Kg/ju1gg+1229mQ\u003d","response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"70161","id":"f1988f51-c9e5-46e4-9393-2a873b2610e0","ctxDetails":"{\"interface com.cloud.network.IpAddress\":\"f1988f51-c9e5-46e4-9393-2a873b2610e0\"}","ctxAccountId":"2","uuid":"f1988f51-c9e5-46e4-9393-2a873b2610e0","cmdEventType":"NET.IPRELEASE"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 198690281380792, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2024-05-02 14:32:51,363 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6807:ctx-ac7656e2 ctx-07d762b2 ctx-c606fd19) (logid:ae5760e6) ===END===  X.Y.191.80 -- GET  apiKey=REDACTED&command=disassociateIpAddress&id=f1988f51-c9e5-46e4-9393-2a873b2610e0&response=json&signature=02Sf59NUDKw%2FKg%2Fju1gg%2B1229mQ%3D
2024-05-02 14:33:07,514 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-33:ctx-7c0ac951 job-2780) (logid:94060328) Complete async job-2780, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Failed to disassociate IP address"}
2024-05-02 14:32:41,747 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6674:ctx-b0b01e22) (logid:ace12149) ===START===  X.Y.191.80 -- GET  account=test-admin&apiKey=REDACTED&command=associateIpAddress&ipaddress=XX.YY.17.30&networkid=c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0&projectid=05019f40-8552-4856-9cf1-02ae1b7e2621&response=json&zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7&signature=pcavcaveiIHGjw%2FlWwFeH3mX63Q%3D
2024-05-02 14:32:41,805 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6674:ctx-b0b01e22 ctx-083bb8ea ctx-1ad4ec66) (logid:ace12149) ===END===  X.Y.191.80 -- GET  account=test-admin&apiKey=REDACTED&command=associateIpAddress&ipaddress=XX.YY.17.30&networkid=c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0&projectid=05019f40-8552-4856-9cf1-02ae1b7e2621&response=json&zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7&signature=pcavcaveiIHGjw%2FlWwFeH3mX63Q%3D
2024-05-02 14:32:41,837 DEBUG [c.c.n.IpAddressManagerImpl] (API-Job-Executor-30:ctx-4262738c job-2777 ctx-cd5a7ce3) (logid:eb86be5e) Successfully associated ip address XX.YY.17.30 to network Network {"id": 225, "name": "network-A", "uuid": "c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0", "networkofferingid": 9}
2024-05-02 14:32:41,850 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-30:ctx-4262738c job-2777 ctx-cd5a7ce3) (logid:eb86be5e) Complete async job-2777, jobStatus: SUCCEEDED, resultCode: 0, result: org.apache.cloudstack.api.response.IPAddressResponse/ipaddress/{"id":"f1988f51-c9e5-46e4-9393-2a873b2610e0","ipaddress":"XX.YY.17.30","allocated":"2024-05-02T14:32:41+0200","zoneid":"3adfd657-7e4e-4613-99da-49b3a0ef9db7","zonename":"ZONE","issourcenat":"false","projectid":"05019f40-8552-4856-9cf1-02ae1b7e2621","project":"testcpu","domainid":"e8858164-61cb-47ae-b05e-21163381e5ff","domain":"RND","forvirtualnetwork":"true","vlanid":"5c4f2de6-4555-4f8c-8af2-f2fd325bf101","vlanname":"vlan://untagged","isstaticnat":"false","issystem":"false","associatednetworkid":"c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0","associatednetworkname":"network-A","networkid":"08fb8c61-c810-487b-b931-1911a72a0f3e","state":"Allocated","physicalnetworkid":"cc445c30-6484-47e5-b13b-ffdd6554c02b","tags":[],"isportable":"false","fordisplay":"true","hasannotations":"false"}
2024-05-02 14:32:42,902 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6814:ctx-826cfc0d) (logid:08064c3d) ===START===  X.Y.191.80 -- GET  account=test-admin&apiKey=REDACTED&command=associateIpAddress&ipaddress=XX.YY.17.30&networkid=c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0&projectid=05019f40-8552-4856-9cf1-02ae1b7e2621&response=json&zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7&signature=pcavcaveiIHGjw%2FlWwFeH3mX63Q%3D
2024-05-02 14:32:42,932 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6814:ctx-826cfc0d ctx-96b1e3fd ctx-d8a477c6) (logid:08064c3d) ===END===  X.Y.191.80 -- GET  account=test-admin&apiKey=REDACTED&command=associateIpAddress&ipaddress=XX.YY.17.30&networkid=c6a4bc87-93cc-499c-8fdc-4e44b4d0fca0&projectid=05019f40-8552-4856-9cf1-02ae1b7e2621&response=json&zoneid=3adfd657-7e4e-4613-99da-49b3a0ef9db7&signature=pcavcaveiIHGjw%2FlWwFeH3mX63Q%3D
2024-05-02 14:32:48,278 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6705:ctx-3cafb699) (logid:b65f7570) ===START===  X.Y.191.80 -- GET  apiKey=REDACTED&command=disassociateIpAddress&id=f1988f51-c9e5-46e4-9393-2a873b2610e0&response=json&signature=02Sf59NUDKw%2FKg%2Fju1gg%2B1229mQ%3D
2024-05-02 14:32:48,301 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (qtp1750498848-6705:ctx-3cafb699 ctx-59398c9e ctx-8f5a2116) (logid:b65f7570) submit async job-2780, details: AsyncJobVO: {id:2780, userId: 2, accountId: 2, instanceType: IpAddress, instanceId: 107, cmd: org.apache.cloudstack.api.command.user.address.DisassociateIPAddrCmd, cmdInfo: {"apiKey":"REDACTED","signature":"02Sf59NUDKw/Kg/ju1gg+1229mQ\u003d","response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"70154","id":"f1988f51-c9e5-46e4-9393-2a873b2610e0","ctxDetails":"{\"interface com.cloud.network.IpAddress\":\"f1988f51-c9e5-46e4-9393-2a873b2610e0\"}","ctxAccountId":"2","uuid":"f1988f51-c9e5-46e4-9393-2a873b2610e0","cmdEventType":"NET.IPRELEASE"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 198690281380792, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2024-05-02 14:32:48,301 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-33:ctx-7c0ac951 job-2780) (logid:94060328) Executing AsyncJobVO: {id:2780, userId: 2, accountId: 2, instanceType: IpAddress, instanceId: 107, cmd: org.apache.cloudstack.api.command.user.address.DisassociateIPAddrCmd, cmdInfo: {"apiKey":"REDACTED","signature":"02Sf59NUDKw/Kg/ju1gg+1229mQ\u003d","response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"70154","id":"f1988f51-c9e5-46e4-9393-2a873b2610e0","ctxDetails":"{\"interface com.cloud.network.IpAddress\":\"f1988f51-c9e5-46e4-9393-2a873b2610e0\"}","ctxAccountId":"2","uuid":"f1988f51-c9e5-46e4-9393-2a873b2610e0","cmdEventType":"NET.IPRELEASE"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 198690281380792, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2024-05-02 14:32:48,302 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6705:ctx-3cafb699 ctx-59398c9e ctx-8f5a2116) (logid:b65f7570) ===END===  X.Y.191.80 -- GET  apiKey=REDACTED&command=disassociateIpAddress&id=f1988f51-c9e5-46e4-9393-2a873b2610e0&response=json&signature=02Sf59NUDKw%2FKg%2Fju1gg%2B1229mQ%3D
2024-05-02 14:32:51,347 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6807:ctx-ac7656e2) (logid:ae5760e6) ===START===  X.Y.191.80 -- GET  apiKey=REDACTED&command=disassociateIpAddress&id=f1988f51-c9e5-46e4-9393-2a873b2610e0&response=json&signature=02Sf59NUDKw%2FKg%2Fju1gg%2B1229mQ%3D
2024-05-02 14:32:51,362 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (qtp1750498848-6807:ctx-ac7656e2 ctx-07d762b2 ctx-c606fd19) (logid:ae5760e6) submit async job-2782, details: AsyncJobVO: {id:2782, userId: 2, accountId: 2, instanceType: IpAddress, instanceId: 107, cmd: org.apache.cloudstack.api.command.user.address.DisassociateIPAddrCmd, cmdInfo: {"apiKey":"REDACTED","signature":"02Sf59NUDKw/Kg/ju1gg+1229mQ\u003d","response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"70161","id":"f1988f51-c9e5-46e4-9393-2a873b2610e0","ctxDetails":"{\"interface com.cloud.network.IpAddress\":\"f1988f51-c9e5-46e4-9393-2a873b2610e0\"}","ctxAccountId":"2","uuid":"f1988f51-c9e5-46e4-9393-2a873b2610e0","cmdEventType":"NET.IPRELEASE"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 198690281380792, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2024-05-02 14:32:51,362 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-35:ctx-1248a5f3 job-2782) (logid:bb1f9468) Executing AsyncJobVO: {id:2782, userId: 2, accountId: 2, instanceType: IpAddress, instanceId: 107, cmd: org.apache.cloudstack.api.command.user.address.DisassociateIPAddrCmd, cmdInfo: {"apiKey":"REDACTED","signature":"02Sf59NUDKw/Kg/ju1gg+1229mQ\u003d","response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"70161","id":"f1988f51-c9e5-46e4-9393-2a873b2610e0","ctxDetails":"{\"interface com.cloud.network.IpAddress\":\"f1988f51-c9e5-46e4-9393-2a873b2610e0\"}","ctxAccountId":"2","uuid":"f1988f51-c9e5-46e4-9393-2a873b2610e0","cmdEventType":"NET.IPRELEASE"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 198690281380792, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2024-05-02 14:32:51,363 DEBUG [c.c.a.ApiServlet] (qtp1750498848-6807:ctx-ac7656e2 ctx-07d762b2 ctx-c606fd19) (logid:ae5760e6) ===END===  X.Y.191.80 -- GET  apiKey=REDACTED&command=disassociateIpAddress&id=f1988f51-c9e5-46e4-9393-2a873b2610e0&response=json&signature=02Sf59NUDKw%2FKg%2Fju1gg%2B1229mQ%3D

vdombrovski avatar May 02 '24 12:05 vdombrovski

@rohityadavcloud do you have any fix of this in the changelog from 4.18 or 4.19 ? Because if it's not the case, I don't see the point you mentioned at all

We will reproduce this behavior on a 4.19 if need in due time, but for now we have 4 or 5 platform in 4.17 who can't be migrate because 4.18 didn't passed our testing scenario yet : each time we had regression

Best

bragonznx avatar May 02 '24 12:05 bragonznx

We haven't any java development capability : we only have Python developer in our company. But I'm open to any proposal to fix this very annoying issue.

bragonznx avatar May 02 '24 13:05 bragonznx

@vdombrovski @bragonznx after some hours, I am able to reproduce the issue twice

it looks like the issue is caused by concurrent actions

  • disassociate IP address which removes firewall rules, etc
  • but at the same time, there is another session creates a firewall rule on the IP

The action failed but the IP is marked as Free, but actually it is still configured in the VRs

To confirm it, can you check the management server log

egrep 'Failed to release | Unable to revoke ' /var/log/cloudstack/management/management-server.log

weizhouapache avatar May 03 '24 10:05 weizhouapache

root@mgt-01:~# zgrep 'Failed to release | Unable to revoke ' /var/log/cloudstack/management/management-server.log.*

root@mgt-01:~#

root@mgt-01:~# egrep 'Failed to release | Unable to revoke ' /var/log/cloudstack/management/management-server.log

root@mgt-01:~#

bragonznx avatar May 06 '24 10:05 bragonznx

zgrep 'Failed to release | Unable to revoke ' /var/log/cloudstack/management/management-server.log.*

@bragonznx can you use zegrep instead ?

zegrep 'Failed to release | Unable to revoke ' /var/log/cloudstack/management/management-server.log.*

weizhouapache avatar May 06 '24 11:05 weizhouapache

root@mgt-01:~# zegrep 'Failed to release | Unable to revoke ' /var/log/cloudstack/management/management-server.log.*

root@mgt-01:~#

no matter the kind of grep you want to use, there is no occurrence in our production logs.

bragonznx avatar May 06 '24 11:05 bragonznx

root@mgt-01:~# zegrep 'Failed to release | Unable to revoke ' /var/log/cloudstack/management/management-server.log.*

root@mgt-01:~#

no matter the kind of grep you want to use, there is no occurrence in our production logs.

@bragonznx if you have multiple management servers, can you check the logs on other management servers ?

weizhouapache avatar May 06 '24 12:05 weizhouapache

zegrep 'Failed to release | Unable to revoke ' /var/log/cloudstack/management/management-server.log.*

is the pipe symbol within the single quotes a problem in this command?

maybe try

zegrep 'Failed to release.*Unable to revoke' /var/log/cloudstack/management/management-server.log.*

instead?

DaanHoogland avatar May 06 '24 12:05 DaanHoogland

root@mgt-01:~# zegrep 'Failed to release.*Unable to revoke' /var/log/cloudstack/management/management-server.log.*

root@mgt-01:~#

bragonznx avatar May 06 '24 13:05 bragonznx

root@mgt-01:~# zegrep 'Failed to release.*Unable to revoke' /var/log/cloudstack/management/management-server.log.*

root@mgt-01:~#

@bragonznx Can you check other management servers if any?

weizhouapache avatar May 06 '24 14:05 weizhouapache

It's on management server where @vdombrovski did all the tests, so it should be here....

bragonznx avatar May 06 '24 14:05 bragonznx

It's on management server where @vdombrovski did all the tests, so it should be here....

@bragonznx it would be good to upload the management server logs. if not possible, can you please search Exception/error in the logs ?

zegrep -i "exception|error" /var/log/cloudstack/management/management-server.log.*

weizhouapache avatar May 06 '24 14:05 weizhouapache

What do you want to proof ? Because we have both got the issue of this IP conflicts issue. I can't spend my time digging in Exception/error, I have too much of them.

If you need me to copy past millions of "/var/log/cloudstack/management/management-server.log.2024-01-05.gz:2024-01-05 02:55:12,904 ERROR [c.c.s.StatsCollector] (StatsCollector-5:ctx-24d81e3e) (logid:81fce2ab)" you have to explain me the value, because for me it's around 0. We don't have to proof again the issue, everyone in the discussion know that it's real, and the issue is here, Vladimir replicated it several times, and you managed to do it on your side too.

bragonznx avatar May 06 '24 14:05 bragonznx

What do you want to proof ? Because we have both got the issue of this IP conflicts issue. I can't spend my time digging in Exception/error, I have too much of them.

If you need me to copy past millions of "/var/log/cloudstack/management/management-server.log.2024-01-05.gz:2024-01-05 02:55:12,904 ERROR [c.c.s.StatsCollector] (StatsCollector-5:ctx-24d81e3e) (logid:81fce2ab)" you have to explain me the value, because for me it's around 0. We don't have to proof again the issue, everyone in the discussion know that it's real, and the issue is here, Vladimir replicated it several times, and you managed to do it on your side too.

@bragonznx yes, I have reproduced the issue. However, it looks like the root cause is different from yours, because I saw some errors like Failed to release resources for ip address and Unable to revoke all the firewall rules for ip sometimes in my log, but you did not find it in your logs.

I did not expect you dig into the management server logs. I just requested you to share some management server log as the logs might give some clue what the root cause is.

It is up to you upload or not upload the management server logs (of the days when @vdombrovski tested).

weizhouapache avatar May 06 '24 15:05 weizhouapache

What do you want to proof ?

@bragonznx , if you really want to provide "proof", it would need to be a fix. This is open source and Wei is trying to figure it out , out of interest.

Because we have both got the issue of this IP conflicts issue. I can't spend my time digging in Exception/error, I have too much of them.

Neither can anybody else spend the time for you, if you are not willing.

If you need me to copy past millions of "/var/log/cloudstack/management/management-server.log.2024-01-05.gz:2024-01-05 02:55:12,904 ERROR [c.c.s.StatsCollector] (StatsCollector-5:ctx-24d81e3e) (logid:81fce2ab)" you have to explain me the value, because for me it's around 0. We don't have to proof again the issue, everyone in the discussion know that it's real, and the issue is here, Vladimir replicated it several times, and you managed to do it on your side too.

Nobody is obliged to care about your issue either! Again, this is open source and interested people are willing to help you solve it, but please pay them respect.

DaanHoogland avatar May 07 '24 06:05 DaanHoogland