calico icon indicating copy to clipboard operation
calico copied to clipboard

openstack neutron-server (caracal release) using calico as core_plugin returns error when creating security group rules

Open sp3c1k opened this issue 1 year ago • 3 comments

When using calico in neutron-server (caracal release) as a core plugin:

neutron.conf

[DEFAULT]
core_plugin = calico

there seems to be a problem when creating security group rules.

neutron-server.log

2024-09-17 10:58:05.713 1599056 INFO networking_calico.plugins.ml2.drivers.calico.mech_calico [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] security_groups_rule_updated: <neutron_lib.context.Context object at 0x7c8d10a5b430> ['3afec1e5-116e-4966-8271-02de2ceca667']
2024-09-17 10:58:05.713 1599056 INFO networking_calico.plugins.ml2.drivers.calico.mech_calico [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] Calico state already initialised for PID 1599056
2024-09-17 10:58:05.714 1599056 INFO networking_calico.plugins.ml2.drivers.calico.mech_calico [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] Updating security group IDs ['3afec1e5-116e-4966-8271-02de2ceca667']
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] create failed: No details.: RuntimeError: Method <function remove_reservation at 0x7c8d125953f0> cannot be called within a transaction.
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource Traceback (most recent call last):
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/api/v2/resource.py", line 98, in resource
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     result = method(request=request, **args)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/api/v2/base.py", line 440, in create
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     return self._create(request, body, **kwargs)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 137, in wrapped
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     with excutils.save_and_reraise_exception():
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     self.force_reraise()
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     raise self.value
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 135, in wrapped
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     return f(*args, **kwargs)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 144, in wrapper
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     with excutils.save_and_reraise_exception() as ectxt:
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     self.force_reraise()
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     raise self.value
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 142, in wrapper
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     return f(*args, **kwargs)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 183, in wrapped
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     with excutils.save_and_reraise_exception():
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     self.force_reraise()
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     raise self.value
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 181, in wrapped
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     return f(*dup_args, **dup_kwargs)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/api/v2/base.py", line 567, in _create
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     return notify({self._resource: self._view(request.context,
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/api/v2/base.py", line 507, in notify
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     quota.QUOTAS.commit_reservation(
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/quota/__init__.py", line 103, in commit_reservation
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     self.get_driver().commit_reservation(context, reservation_id)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/db/quota/driver.py", line 271, in commit_reservation
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     quota_api.remove_reservation(context, reservation_id,
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/common/utils.py", line 724, in inner
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     raise RuntimeError(_("Method %s cannot be called within a "
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource RuntimeError: Method <function remove_reservation at 0x7c8d125953f0> cannot be called within a transaction.
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource
2024-09-17 10:58:05.788 1599056 INFO neutron.wsgi [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] 10.230.8.7,127.0.0.1 "POST /v2.0/security-group-rules HTTP/1.1" status: 500  len: 344 time: 0.2229519
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| ID                                   | IP Protocol | Ethertype | IP Range  | Port Range | Direction | Remote Security Group | Remote Address Group |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| 8fb5aa7a-a1e4-4f46-927e-8e5729437350 | None        | IPv4      | 0.0.0.0/0 |            | egress    | None                  | None                 |
| b09b1bdc-ee3d-4086-8b70-8df02696d760 | None        | IPv6      | ::/0      |            | egress    | None                  | None                 |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
root@c1:/var/log/neutron# openstack security group rule create --ingress --remote-ip 0.0.0.0/0 --protocol tcp --dst-port 55 specik-test
Error while executing command: HttpException: 500, Request Failed: internal server error while processing your request.
root@c1:/var/log/neutron# openstack security group rule list specik-test
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| ID                                   | IP Protocol | Ethertype | IP Range  | Port Range | Direction | Remote Security Group | Remote Address Group |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| 8fb5aa7a-a1e4-4f46-927e-8e5729437350 | None        | IPv4      | 0.0.0.0/0 |            | egress    | None                  | None                 |
| a5afa681-72fa-4df3-a502-d719699d7a83 | tcp         | IPv4      | 0.0.0.0/0 | 55:55      | ingress   | None                  | None                 |
| b09b1bdc-ee3d-4086-8b70-8df02696d760 | None        | IPv6      | ::/0      |            | egress    | None                  | None                 |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+

Although the API returns HTTP 500, the rule is created.

Expected Behavior

The security group rule should be created without an error.

Current Behavior

When creating a security group rule, the HTTP 500 is returned, but the rule is created anyway.

This behavior seems to be caused by multiple factors (changes in oslo_db, neutron, neutron_lib) in regards how it creates sessions, how the context with the session is propagated throughout the application and changes regarding the preparation for the sqlalchemy 2.0.

Openstack devstack on caracal release do not have this kind of issue with native networking. When calico is used as a core plugin, we encounter this issue.

Possible Solution

The problem seems to be coming from: https://github.com/projectcalico/calico/blob/4ad72b7c787a714403febb2ad72c5947b94d3647/networking-calico/networking_calico/plugins/ml2/drivers/calico/mech_calico.py#L886-L896

conn_url = str(session.connection().engine.url).lower() creates a new connection with the session just to get the engine url so it can be used in this part of the code: https://github.com/projectcalico/calico/blob/4ad72b7c787a714403febb2ad72c5947b94d3647/networking-calico/networking_calico/plugins/ml2/drivers/calico/mech_calico.py#L895-L909

However that seems to be a problem later in neutron where the is_session_active method moved from neutron to neutron_lib:

  • old: https://github.com/openstack/neutron/blob/19ab990dc20aaa7d4fd91fc3aad8aff9407677fb/neutron/common/utils.py#L1034-L1051
  • new: https://github.com/openstack/neutron-lib/blob/970c2bdebaa19c5ae63e616ba0648c8edaae2b2f/neutron_lib/db/api.py#L465-L483

With the Yoga release, where we do not see this problem, it exits this method via

    if session.autocommit:  # old behaviour, to be removed with sqlalchemy 2.0
        return session.is_active

meaning the autocommit = True and is_active = False, at least that is what I am seeing in the debugger.

With the Caracal release, situation is different, is_session_active returns True

    if getattr(session, 'autocommit', None):
        # old behaviour, to be removed with sqlalchemy 2.0
        return session.is_active
    if not session.get_transaction():
        return False
    if not session.get_transaction()._connections:
        return False
    return True
  • autocommit = False (seems to be related to changes in oslo_db)
  • is_active = True (it looks like it does not matter as long as autocommit is not True)
  • session.get_transaction() returns True, because there are transactions inside a session
  • session.get_transaction()._connections also returns True as there are connections inside of the transaction (from the calico opening a connection as mentioned earlier - it probably did not matter to this point, because it always exited the is_session_active check via the if session.autocommit: as written above).

what helped me to resolve this issue

I fixed this behavior with this in networking_calico/plugins/ml2/drivers/calico/mech_calico.py:_txn_from_context():

        if getattr(session, 'bind', None):
            conn_url = str(session.bind.url).lower()
        else:
            conn_url = str(session.connection().engine.url).lower()

Connection url can be obtained (as observed on my side in debugger) from session.bind.

        # sqlalchemy/orm/session.py
        # :param bind: An optional :class:`_engine.Engine` or
        #    :class:`_engine.Connection` to
        #    which this ``Session`` should be bound. When specified, all SQL
        #    operations performed by this session will execute via this
        #    connectable.

That means if the session already has an established bind/connection to database, we can use that instead of creating a new connection just to get the conn_url.

The fix also keeps the fallback to the old behavior in case the session.bind = None, in that case, it fallbacks and creates a new session.connection() to get the url.

Steps to Reproduce (for bugs)

  1. Setup Openstack Caracal release
  2. Use calico as core_plugin for neutron-server
  3. create new security group
  4. create a new security group rule to the created security group

Context

This issue was found after upgrading an Openstack cluster from Yoga release to Caracal.

The security group rule is created successfully, but Neutron API returns HTTP 500. So far only encountered this issue when creating new security group rules and always when neutron tries to remove the quota reservation as seen in the log at the top of this issue.

Your Environment

  • Calico version: 3.28.1
  • Operating System and version: Ubuntu 22.04.4 LTS
  • Openstack version: 2024.1 (Caracal)

sp3c1k avatar Sep 17 '24 13:09 sp3c1k

I'm sorry for another lenghty issue, I've tried to get together as much information as I could. It seems there is quite a lot of changes in the new openstack releases regarding neutron and some of them are currently not compatible with using calico as neutron core_plugin.

sp3c1k avatar Sep 17 '24 14:09 sp3c1k

@nelljerram Can you please take a look?

mazdakn avatar Sep 24 '24 16:09 mazdakn

@sp3c1k As you probably know, we don't officially support Caracal yet. I think it will be more efficient for us to investigate this kind of issue when we take on the job as a whole of supporting Caracal.

FYI I've kicked off a PR at https://github.com/projectcalico/calico/pull/9278 to start looking at Caracal - but I can't make any promises about how quickly that will progress.

nelljerram avatar Sep 25 '24 11:09 nelljerram