openstack neutron-server (caracal release) using calico as core_plugin returns error when creating security group rules
When using calico in neutron-server (caracal release) as a core plugin:
neutron.conf
[DEFAULT]
core_plugin = calico
there seems to be a problem when creating security group rules.
neutron-server.log
2024-09-17 10:58:05.713 1599056 INFO networking_calico.plugins.ml2.drivers.calico.mech_calico [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] security_groups_rule_updated: <neutron_lib.context.Context object at 0x7c8d10a5b430> ['3afec1e5-116e-4966-8271-02de2ceca667']
2024-09-17 10:58:05.713 1599056 INFO networking_calico.plugins.ml2.drivers.calico.mech_calico [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] Calico state already initialised for PID 1599056
2024-09-17 10:58:05.714 1599056 INFO networking_calico.plugins.ml2.drivers.calico.mech_calico [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] Updating security group IDs ['3afec1e5-116e-4966-8271-02de2ceca667']
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] create failed: No details.: RuntimeError: Method <function remove_reservation at 0x7c8d125953f0> cannot be called within a transaction.
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource Traceback (most recent call last):
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/neutron/api/v2/resource.py", line 98, in resource
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource result = method(request=request, **args)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/neutron/api/v2/base.py", line 440, in create
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource return self._create(request, body, **kwargs)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 137, in wrapped
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource with excutils.save_and_reraise_exception():
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource self.force_reraise()
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource raise self.value
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 135, in wrapped
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource return f(*args, **kwargs)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 144, in wrapper
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource with excutils.save_and_reraise_exception() as ectxt:
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource self.force_reraise()
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource raise self.value
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 142, in wrapper
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource return f(*args, **kwargs)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 183, in wrapped
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource with excutils.save_and_reraise_exception():
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource self.force_reraise()
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource raise self.value
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 181, in wrapped
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource return f(*dup_args, **dup_kwargs)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/neutron/api/v2/base.py", line 567, in _create
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource return notify({self._resource: self._view(request.context,
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/neutron/api/v2/base.py", line 507, in notify
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource quota.QUOTAS.commit_reservation(
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/neutron/quota/__init__.py", line 103, in commit_reservation
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource self.get_driver().commit_reservation(context, reservation_id)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/neutron/db/quota/driver.py", line 271, in commit_reservation
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource quota_api.remove_reservation(context, reservation_id,
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource File "/usr/lib/python3/dist-packages/neutron/common/utils.py", line 724, in inner
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource raise RuntimeError(_("Method %s cannot be called within a "
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource RuntimeError: Method <function remove_reservation at 0x7c8d125953f0> cannot be called within a transaction.
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource
2024-09-17 10:58:05.788 1599056 INFO neutron.wsgi [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] 10.230.8.7,127.0.0.1 "POST /v2.0/security-group-rules HTTP/1.1" status: 500 len: 344 time: 0.2229519
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| ID | IP Protocol | Ethertype | IP Range | Port Range | Direction | Remote Security Group | Remote Address Group |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| 8fb5aa7a-a1e4-4f46-927e-8e5729437350 | None | IPv4 | 0.0.0.0/0 | | egress | None | None |
| b09b1bdc-ee3d-4086-8b70-8df02696d760 | None | IPv6 | ::/0 | | egress | None | None |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
root@c1:/var/log/neutron# openstack security group rule create --ingress --remote-ip 0.0.0.0/0 --protocol tcp --dst-port 55 specik-test
Error while executing command: HttpException: 500, Request Failed: internal server error while processing your request.
root@c1:/var/log/neutron# openstack security group rule list specik-test
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| ID | IP Protocol | Ethertype | IP Range | Port Range | Direction | Remote Security Group | Remote Address Group |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| 8fb5aa7a-a1e4-4f46-927e-8e5729437350 | None | IPv4 | 0.0.0.0/0 | | egress | None | None |
| a5afa681-72fa-4df3-a502-d719699d7a83 | tcp | IPv4 | 0.0.0.0/0 | 55:55 | ingress | None | None |
| b09b1bdc-ee3d-4086-8b70-8df02696d760 | None | IPv6 | ::/0 | | egress | None | None |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
Although the API returns HTTP 500, the rule is created.
Expected Behavior
The security group rule should be created without an error.
Current Behavior
When creating a security group rule, the HTTP 500 is returned, but the rule is created anyway.
This behavior seems to be caused by multiple factors (changes in oslo_db, neutron, neutron_lib) in regards how it creates sessions, how the context with the session is propagated throughout the application and changes regarding the preparation for the sqlalchemy 2.0.
Openstack devstack on caracal release do not have this kind of issue with native networking. When calico is used as a core plugin, we encounter this issue.
Possible Solution
The problem seems to be coming from: https://github.com/projectcalico/calico/blob/4ad72b7c787a714403febb2ad72c5947b94d3647/networking-calico/networking_calico/plugins/ml2/drivers/calico/mech_calico.py#L886-L896
conn_url = str(session.connection().engine.url).lower() creates a new connection with the session just to get the engine url so it can be used in this part of the code:
https://github.com/projectcalico/calico/blob/4ad72b7c787a714403febb2ad72c5947b94d3647/networking-calico/networking_calico/plugins/ml2/drivers/calico/mech_calico.py#L895-L909
However that seems to be a problem later in neutron where the is_session_active method moved from neutron to neutron_lib:
- old: https://github.com/openstack/neutron/blob/19ab990dc20aaa7d4fd91fc3aad8aff9407677fb/neutron/common/utils.py#L1034-L1051
- new: https://github.com/openstack/neutron-lib/blob/970c2bdebaa19c5ae63e616ba0648c8edaae2b2f/neutron_lib/db/api.py#L465-L483
With the Yoga release, where we do not see this problem, it exits this method via
if session.autocommit: # old behaviour, to be removed with sqlalchemy 2.0
return session.is_active
meaning the autocommit = True and is_active = False, at least that is what I am seeing in the debugger.
With the Caracal release, situation is different, is_session_active returns True
if getattr(session, 'autocommit', None):
# old behaviour, to be removed with sqlalchemy 2.0
return session.is_active
if not session.get_transaction():
return False
if not session.get_transaction()._connections:
return False
return True
autocommit = False(seems to be related to changes in oslo_db)is_active = True(it looks like it does not matter as long as autocommit is not True)session.get_transaction()returns True, because there are transactions inside a sessionsession.get_transaction()._connectionsalso returns True as there are connections inside of the transaction (from the calico opening a connection as mentioned earlier - it probably did not matter to this point, because it always exited theis_session_activecheck via theif session.autocommit:as written above).
what helped me to resolve this issue
I fixed this behavior with this in networking_calico/plugins/ml2/drivers/calico/mech_calico.py:_txn_from_context():
if getattr(session, 'bind', None):
conn_url = str(session.bind.url).lower()
else:
conn_url = str(session.connection().engine.url).lower()
Connection url can be obtained (as observed on my side in debugger) from session.bind.
# sqlalchemy/orm/session.py
# :param bind: An optional :class:`_engine.Engine` or
# :class:`_engine.Connection` to
# which this ``Session`` should be bound. When specified, all SQL
# operations performed by this session will execute via this
# connectable.
That means if the session already has an established bind/connection to database, we can use that instead of creating a new connection just to get the conn_url.
The fix also keeps the fallback to the old behavior in case the session.bind = None, in that case, it fallbacks and creates a new
session.connection() to get the url.
Steps to Reproduce (for bugs)
- Setup Openstack Caracal release
- Use calico as core_plugin for neutron-server
- create new security group
- create a new security group rule to the created security group
Context
This issue was found after upgrading an Openstack cluster from Yoga release to Caracal.
The security group rule is created successfully, but Neutron API returns HTTP 500. So far only encountered this issue when creating new security group rules and always when neutron tries to remove the quota reservation as seen in the log at the top of this issue.
Your Environment
- Calico version: 3.28.1
- Operating System and version: Ubuntu 22.04.4 LTS
- Openstack version: 2024.1 (Caracal)
I'm sorry for another lenghty issue, I've tried to get together as much information as I could. It seems there is quite a lot of changes in the new openstack releases regarding neutron and some of them are currently not compatible with using calico as neutron core_plugin.
@nelljerram Can you please take a look?
@sp3c1k As you probably know, we don't officially support Caracal yet. I think it will be more efficient for us to investigate this kind of issue when we take on the job as a whole of supporting Caracal.
FYI I've kicked off a PR at https://github.com/projectcalico/calico/pull/9278 to start looking at Caracal - but I can't make any promises about how quickly that will progress.