airflow
airflow copied to clipboard
Upgrading to airflow 2.4.0 from 2.3.4 causes NotNullViolation error
Apache Airflow version
2.4.0
What happened
Stopped existing processes, upgraded from airflow 2.3.4 to 2.4.0, and ran airflow db upgrade successfully. Upon restarting the services, I'm not seeing any dag runs from the past 10 days. I kick off a new job, and I don't see it show up in the grid view. Upon checking the systemd logs, I see that there are a lot of postgress errors with webserver. Below is a sample of such errors.
` [SQL: INSERT INTO ab_view_menu (name) VALUES (%(name)s) RETURNING ab_view_menu.id] [parameters: {'name': 'Datasets'}] (Background on this error at: https://sqlalche.me/e/14/gkpj) [2022-09-19 14:03:16,183] {manager.py:511} ERROR - Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, 13, null).
[SQL: INSERT INTO ab_permission_view (permission_id, view_menu_id) VALUES (%(permission_id)s, %(view_menu_id)s) RETURNING ab_permission_view.id] [parameters: {'permission_id': 13, 'view_menu_id': None}] (Background on this error at: https://sqlalche.me/e/14/gkpj) [2022-09-19 14:03:16,209] {manager.py:420} ERROR - Add View Menu Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, Datasets).
[SQL: INSERT INTO ab_view_menu (name) VALUES (%(name)s) RETURNING ab_view_menu.id] [parameters: {'name': 'Datasets'}] (Background on this error at: https://sqlalche.me/e/14/gkpj) [2022-09-19 14:03:16,212] {manager.py:511} ERROR - Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, 17, null).
[SQL: INSERT INTO ab_permission_view (permission_id, view_menu_id) VALUES (%(permission_id)s, %(view_menu_id)s) RETURNING ab_permission_view.id] [parameters: {'permission_id': 17, 'view_menu_id': None}] (Background on this error at: https://sqlalche.me/e/14/gkpj) [2022-09-19 14:03:16,229] {manager.py:420} ERROR - Add View Menu Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, DAG Warnings).
[SQL: INSERT INTO ab_view_menu (name) VALUES (%(name)s) RETURNING ab_view_menu.id] [parameters: {'name': 'DAG Warnings'}] (Background on this error at: https://sqlalche.me/e/14/gkpj) [2022-09-19 14:03:16,232] {manager.py:511} ERROR - Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, 17, null).
[SQL: INSERT INTO ab_permission_view (permission_id, view_menu_id) VALUES (%(permission_id)s, %(view_menu_id)s) RETURNING ab_permission_view.id] [parameters: {'permission_id': 17, 'view_menu_id': None}] (Background on this error at: https://sqlalche.me/e/14/gkpj) [2022-09-19 14:03:16,250] {manager.py:511} ERROR - Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, 13, 23). `
I tried running airflow db check, init, check-migration, upgrade without any errors, but the errors still remain.
Please let me know if I missed any steps during the upgrade, or if this is a known issue with a workaround.
What you think should happen instead
All dag runs should be visible
How to reproduce
upgrade airflow, upgrade db, restart the services
Operating System
Ubuntu 18.04.6 LTS
Versions of Apache Airflow Providers
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template!
ah, seems to be the "backend version" of the same issue https://github.com/apache/airflow/issues/26505
----- update ----- maybe not... because i don't see those postgres INSERT errors from my webserver :thinking:
Hello, we also facing this problem. Existing dags are showing up in the Airflow UI but we can't create a new dag. We are using dynamic dag creation to create new dags. After defining the new dag I ran the command "airflow dags reserialize" in the scheduler container. In the output I have had the same SQL like mentioned above.
We also use the officel Helm chart to deploy Airflow. We are using a postgres DB as backend. My feeling is that this is not related to #26505.
I don’t think this is related to #26505, it looks like something in the permission stack not working well with AIP-48 stuff?
We saw the same problem this past week with in an attempted upgrade to 2.4 from 2.1.0. We are using Ubuntu and Postgres in AWS with the scheduler/webserver on an EC2 instance sending work to a Kubernetes cluster on EKS. We tried both airflow db upgrade
and airflow db reset
. Same problem as OP with both approaches and we couldn't view/schedule/trigger jobs.
DAGs show up in the CLI and in the UI, but can't be found when you try to view job details in the UI. I noticed that the scheduler constantly throws errors about jobs not being in the serialized DAG table.
Manually running airflow dags reserialize
immediately throws a bunch of these null constraint errors, suggesting some link, which is what led me to try the following workaround/experiment.
I used airflow db shell
to reach into the DB and manipulate the id column in the four permission tables. I made these columns IDENTITY columns using ALTER TABLE ab_view_menu ALTER id ADD GENERATED ALWAYS AS IDENTITY (START WITH 5000)
. This appears to have resolved the issues. The errors went away and jobs can be viewed/scheduled/ triggered.
This worked and I'm including it here in case it helps diagnose the root cause, but I have no idea if this introduced other issues and am hesitant to promote this to our upper environments without more research.
Scheduler Errors
View DAG in UI Error
Question: while this looks bad, does it actually cause any process to exit with an error?
I upgraded our server from Ubuntu 18.04 to 20.04.5 and tried installing 2.4.0 again. The same issue persisted, so we've rolled back to 2.3.4. I have another server on 18.04 that successfully managed to upgrade from 2.3.4 to 2.4, so if you need me to make any comparison between the two servers (although the failing one is now on 20.04) to get to the root cause, let me know.
This will be caused by a difference in the rows in the database, not in the OS version.
Is there any chance you could (privately if needed) share a DB dump of the breaking install?
@ManikandanUV What version of SQLAlchemy
and Flask-AppBuilder
do you have installed in each server?
@ashb Failing server: Name: SQLAlchemy Version: 1.4.41 Name: Flask-AppBuilder Version: 4.1.3
Working server: Name: SQLAlchemy Version: 1.4.40 Name: Flask-AppBuilder Version: 4.1.3
@ashb how do I get the db dump? The servers have different dags, so they're not comparable in that aspect. Also, the working server started with 2.x (not sure which one, may be 2.2)->2.3.4->2.4, but the failing one was 1.x->1.10.15->2.3.4
@ashb We have experienced that problem as well while testing to 2.4.0 upgrade (from 2.2.4). We use centos 7 for airflow.
SQLAlchemy - 1.4.27, Flask-AppBuilder 4.1.3. When I pulled DAGs I cannot open any of them on WebUI - I got that error: DAG seems to be missing from DagBag
. Let me know if you need any additional info.
@ashb We have experienced that problem as well while testing to 2.4.0 upgrade (from 2.2.4). We use centos 7 for airflow. SQLAlchemy - 1.4.27, Flask-AppBuilder 4.1.3. When I pulled DAGs I cannot open any of them on WebUI - I got that error:
DAG seems to be missing from DagBag
. Let me know if you need any additional info.
From what version did you start using airflow? Asking so I can try to upgrade from that version to 2.3.4 to 2.4.0
@ashb We have experienced that problem as well while testing to 2.4.0 upgrade (from 2.2.4). We use centos 7 for airflow. SQLAlchemy - 1.4.27, Flask-AppBuilder 4.1.3. When I pulled DAGs I cannot open any of them on WebUI - I got that error:
DAG seems to be missing from DagBag
. Let me know if you need any additional info.From what version did you start using airflow? Asking so I can try to upgrade from that version to 2.3.4 to 2.4.0
I think we started from version 1.10.14
. But I saw some tests on version 1.10.12
- so I am not sure from which version our current database works - it can be 1.10.14
or 1.10.12
I tested it on new 2.2.4 deployment and upgraded it to 2.4.0. I have experienced the same problem.
@ashb I attached examples of the two errors I saw in case it's helpful.
We are seeing the same issue upgrading from 2.3.4
to 2.4.0
[SQL: INSERT INTO ab_view_menu (name) VALUES (%(name)s) RETURNING ab_view_menu.id]
[parameters: {'name': 'DAG:example_branch_datetime_operator_3'}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
ERROR [airflow.www.fab_security.sqla.manager] Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint
DETAIL: Failing row contains (null, 1, null).
[SQL: INSERT INTO ab_permission_view (permission_id, view_menu_id) VALUES (%(permission_id)s, %(view_menu_id)s) RETURNING ab_permission_view.id]
[parameters: {'permission_id': 1, 'view_menu_id': None}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
ERROR [airflow.www.fab_security.sqla.manager] Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint
DETAIL: Failing row contains (null, 4, 139).
Can anyone give us reproduction steps? Cos trying this with a "minimal" 2.3.4 and the upgrading to 2.4.0 neither myself nor @ephraimbuddy have been able to reproduce this, so we're not sure what step we're missing.
(And until we can reproduce it, we can't fix it)
@ashb try our upgrade path, may be that's the key. I have two servers, one started at 1.x and other started at 2.x. The 2.x server upgraded to 2.3.4 and 2.4 without issues, where as the 1.x server upgraded to 2.3.4 and is now failing 2.4 upgrade from 2.3.4
Thanks for the tip @sterling-jackson. Your suggestion resolved this one for us. If you use this solution alter the START WITH
based on the max id in your tables.
ALTER TABLE public.ab_view_menu ALTER id ADD GENERATED ALWAYS AS IDENTITY (START WITH 300);
ALTER TABLE public.ab_permission_view ALTER id ADD GENERATED ALWAYS AS IDENTITY (START WITH 4000);
ALTER TABLE public.ab_permission_view_role ALTER id ADD GENERATED ALWAYS AS IDENTITY (START WITH 3000);
Piling on here. I'm seeing the same thing. Seems to maybe be related Datasets?
[2022-09-29T09:55:10.909-0400] {manager.py:420} ERROR - Add View Menu Error: (psycopg2.errors.NotNullViolation) null value in column "id" of relation "ab_view_menu" violates not-null constraint
DETAIL: Failing row contains (null, Datasets).
We started on version 1.10.xx too and have been upgrading with each release.
Edit: I see now, there is a Datasets views that is added.
We started with 1.8.xx, went to 1.9.xx, 1.10.xx, and somehow all of our FAB tables ended up without sequences set for their IDs, but had the sequences created. We were seeing similar issues in 2.4.0, and manually ran:
ALTER TABLE "public"."ab_permission_view" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_id_seq'::regclass);
ALTER TABLE "public"."ab_permission" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_id_seq'::regclass);
ALTER TABLE "public"."ab_permission_view_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_role_id_seq'::regclass);
ALTER TABLE "public"."ab_register_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_register_user_id_seq'::regclass);
ALTER TABLE "public"."ab_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_role_id_seq'::regclass);
ALTER TABLE "public"."ab_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_id_seq'::regclass);
ALTER TABLE "public"."ab_user_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_role_id_seq'::regclass);
ALTER TABLE "public"."ab_view_menu" ALTER COLUMN "id" SET DEFAULT nextval('ab_view_menu_id_seq'::regclass);
Which resolved our issue.
Thanks @joshowen that's very helpful
ALTER TABLE "public"."ab_permission_view_role" ALTER COLUMN "id" SET DEFAULT ALTER TABLE "public"."ab_view_menu" ALTER COLUMN "id" SET DEFAULT nextval('ab_view_menu_id_seq'::regclass);
We have the same problem after upgrading from 2.3.4 to 2.4.1
Thanks @joshowen, these commands resolved the problem.
Okay, I've found the source of the confusion, and the path needed to trigger this behaviour.
Run airflow webserver with < 1.10.13 in RBAC mode, where FAB creates it's tables.
In 1.10.13 we introduces a migration that creates the tables with the server_default but that migration only did anything if the tables didn't already exist. But the tables created by the FAB model have a default (but not a server_default).
Oh, and the final bit of the puzzle, in 2.4 we finally "took control" of the FAB security models in to airflow and those do not have the default
set.
I'll work on a new migration to fix this up.
Ran into the same issue. We also started in 2017 which should be around version 1.8 something and have upgraded since then. I ran the commands @joshowen posted and now I can open the DAG again.
I am getting a airflow dags reserialize
error though (DAGs already exist). Is it safe to delete from the serialize table and then reserialize again? It appears like everything is working fine as is, the only error is when I run that command in a container.
not until a new user couldn't register did i realized i was having this issue and a ton of such error messages in the logs :joy:
We started with 1.8.xx, went to 1.9.xx, 1.10.xx, and somehow all of our FAB tables ended up without sequences set for their IDs, but had the sequences created. We were seeing similar issues in 2.4.0, and manually ran:
ALTER TABLE "public"."ab_permission_view" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_id_seq'::regclass); ALTER TABLE "public"."ab_permission" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_id_seq'::regclass); ALTER TABLE "public"."ab_permission_view_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_role_id_seq'::regclass); ALTER TABLE "public"."ab_register_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_register_user_id_seq'::regclass); ALTER TABLE "public"."ab_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_role_id_seq'::regclass); ALTER TABLE "public"."ab_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_id_seq'::regclass); ALTER TABLE "public"."ab_user_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_role_id_seq'::regclass); ALTER TABLE "public"."ab_view_menu" ALTER COLUMN "id" SET DEFAULT nextval('ab_view_menu_id_seq'::regclass);
Which resolved our issue.
We ran into this issue upgrading from 2.3.1 to 2.4.1 so it doesn't seem the issue is fixed yet. These table alterations resolved the problem though.
We started with 1.8.xx, went to 1.9.xx, 1.10.xx, and somehow all of our FAB tables ended up without sequences set for their IDs, but had the sequences created. We were seeing similar issues in 2.4.0, and manually ran:
ALTER TABLE "public"."ab_permission_view" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_id_seq'::regclass); ALTER TABLE "public"."ab_permission" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_id_seq'::regclass); ALTER TABLE "public"."ab_permission_view_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_role_id_seq'::regclass); ALTER TABLE "public"."ab_register_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_register_user_id_seq'::regclass); ALTER TABLE "public"."ab_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_role_id_seq'::regclass); ALTER TABLE "public"."ab_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_id_seq'::regclass); ALTER TABLE "public"."ab_user_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_role_id_seq'::regclass); ALTER TABLE "public"."ab_view_menu" ALTER COLUMN "id" SET DEFAULT nextval('ab_view_menu_id_seq'::regclass);
Which resolved our issue.
We ran into this issue upgrading from 2.3.1 to 2.4.1 so it doesn't seem the issue is fixed yet. These table alterations resolved the problem though.
@ashb These must be run before running 0073_2_0_0_prefix_dag_permissions.py
Argh! Good catch