airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Upgrading to airflow 2.4.0 from 2.3.4 causes NotNullViolation error

Open ManikandanUV opened this issue 2 years ago • 2 comments

Apache Airflow version

2.4.0

What happened

Stopped existing processes, upgraded from airflow 2.3.4 to 2.4.0, and ran airflow db upgrade successfully. Upon restarting the services, I'm not seeing any dag runs from the past 10 days. I kick off a new job, and I don't see it show up in the grid view. Upon checking the systemd logs, I see that there are a lot of postgress errors with webserver. Below is a sample of such errors.

` [SQL: INSERT INTO ab_view_menu (name) VALUES (%(name)s) RETURNING ab_view_menu.id] [parameters: {'name': 'Datasets'}] (Background on this error at: https://sqlalche.me/e/14/gkpj) [2022-09-19 14:03:16,183] {manager.py:511} ERROR - Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, 13, null).

[SQL: INSERT INTO ab_permission_view (permission_id, view_menu_id) VALUES (%(permission_id)s, %(view_menu_id)s) RETURNING ab_permission_view.id] [parameters: {'permission_id': 13, 'view_menu_id': None}] (Background on this error at: https://sqlalche.me/e/14/gkpj) [2022-09-19 14:03:16,209] {manager.py:420} ERROR - Add View Menu Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, Datasets).

[SQL: INSERT INTO ab_view_menu (name) VALUES (%(name)s) RETURNING ab_view_menu.id] [parameters: {'name': 'Datasets'}] (Background on this error at: https://sqlalche.me/e/14/gkpj) [2022-09-19 14:03:16,212] {manager.py:511} ERROR - Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, 17, null).

[SQL: INSERT INTO ab_permission_view (permission_id, view_menu_id) VALUES (%(permission_id)s, %(view_menu_id)s) RETURNING ab_permission_view.id] [parameters: {'permission_id': 17, 'view_menu_id': None}] (Background on this error at: https://sqlalche.me/e/14/gkpj) [2022-09-19 14:03:16,229] {manager.py:420} ERROR - Add View Menu Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, DAG Warnings).

[SQL: INSERT INTO ab_view_menu (name) VALUES (%(name)s) RETURNING ab_view_menu.id] [parameters: {'name': 'DAG Warnings'}] (Background on this error at: https://sqlalche.me/e/14/gkpj) [2022-09-19 14:03:16,232] {manager.py:511} ERROR - Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, 17, null).

[SQL: INSERT INTO ab_permission_view (permission_id, view_menu_id) VALUES (%(permission_id)s, %(view_menu_id)s) RETURNING ab_permission_view.id] [parameters: {'permission_id': 17, 'view_menu_id': None}] (Background on this error at: https://sqlalche.me/e/14/gkpj) [2022-09-19 14:03:16,250] {manager.py:511} ERROR - Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, 13, 23). `

I tried running airflow db check, init, check-migration, upgrade without any errors, but the errors still remain.

Please let me know if I missed any steps during the upgrade, or if this is a known issue with a workaround.

What you think should happen instead

All dag runs should be visible

How to reproduce

upgrade airflow, upgrade db, restart the services

Operating System

Ubuntu 18.04.6 LTS

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

ManikandanUV avatar Sep 19 '22 18:09 ManikandanUV

Thanks for opening your first issue here! Be sure to follow the issue template!

boring-cyborg[bot] avatar Sep 19 '22 18:09 boring-cyborg[bot]

ah, seems to be the "backend version" of the same issue https://github.com/apache/airflow/issues/26505

----- update ----- maybe not... because i don't see those postgres INSERT errors from my webserver :thinking:

zachliu avatar Sep 20 '22 15:09 zachliu

Hello, we also facing this problem. Existing dags are showing up in the Airflow UI but we can't create a new dag. We are using dynamic dag creation to create new dags. After defining the new dag I ran the command "airflow dags reserialize" in the scheduler container. In the output I have had the same SQL like mentioned above.

We also use the officel Helm chart to deploy Airflow. We are using a postgres DB as backend. My feeling is that this is not related to #26505.

daniel-andresen-jimdo avatar Sep 26 '22 06:09 daniel-andresen-jimdo

I don’t think this is related to #26505, it looks like something in the permission stack not working well with AIP-48 stuff?

uranusjr avatar Sep 26 '22 06:09 uranusjr

We saw the same problem this past week with in an attempted upgrade to 2.4 from 2.1.0. We are using Ubuntu and Postgres in AWS with the scheduler/webserver on an EC2 instance sending work to a Kubernetes cluster on EKS. We tried both airflow db upgrade and airflow db reset. Same problem as OP with both approaches and we couldn't view/schedule/trigger jobs.

DAGs show up in the CLI and in the UI, but can't be found when you try to view job details in the UI. I noticed that the scheduler constantly throws errors about jobs not being in the serialized DAG table.

Manually running airflow dags reserialize immediately throws a bunch of these null constraint errors, suggesting some link, which is what led me to try the following workaround/experiment.

I used airflow db shell to reach into the DB and manipulate the id column in the four permission tables. I made these columns IDENTITY columns using ALTER TABLE ab_view_menu ALTER id ADD GENERATED ALWAYS AS IDENTITY (START WITH 5000). This appears to have resolved the issues. The errors went away and jobs can be viewed/scheduled/ triggered.

This worked and I'm including it here in case it helps diagnose the root cause, but I have no idea if this introduced other issues and am hesitant to promote this to our upper environments without more research.

Scheduler Errors image

View DAG in UI Error image

sterling-jackson avatar Sep 27 '22 15:09 sterling-jackson

Question: while this looks bad, does it actually cause any process to exit with an error?

ashb avatar Sep 27 '22 15:09 ashb

I upgraded our server from Ubuntu 18.04 to 20.04.5 and tried installing 2.4.0 again. The same issue persisted, so we've rolled back to 2.3.4. I have another server on 18.04 that successfully managed to upgrade from 2.3.4 to 2.4, so if you need me to make any comparison between the two servers (although the failing one is now on 20.04) to get to the root cause, let me know.

ManikandanUV avatar Sep 27 '22 15:09 ManikandanUV

This will be caused by a difference in the rows in the database, not in the OS version.

Is there any chance you could (privately if needed) share a DB dump of the breaking install?

ashb avatar Sep 27 '22 15:09 ashb

@ManikandanUV What version of SQLAlchemy and Flask-AppBuilder do you have installed in each server?

ashb avatar Sep 27 '22 15:09 ashb

@ashb Failing server: Name: SQLAlchemy Version: 1.4.41 Name: Flask-AppBuilder Version: 4.1.3

Working server: Name: SQLAlchemy Version: 1.4.40 Name: Flask-AppBuilder Version: 4.1.3

ManikandanUV avatar Sep 27 '22 15:09 ManikandanUV

@ashb how do I get the db dump? The servers have different dags, so they're not comparable in that aspect. Also, the working server started with 2.x (not sure which one, may be 2.2)->2.3.4->2.4, but the failing one was 1.x->1.10.15->2.3.4

ManikandanUV avatar Sep 27 '22 15:09 ManikandanUV

@ashb We have experienced that problem as well while testing to 2.4.0 upgrade (from 2.2.4). We use centos 7 for airflow. SQLAlchemy - 1.4.27, Flask-AppBuilder 4.1.3. When I pulled DAGs I cannot open any of them on WebUI - I got that error: DAG seems to be missing from DagBag. Let me know if you need any additional info.

sfc-gh-mkmak avatar Sep 27 '22 16:09 sfc-gh-mkmak

@ashb We have experienced that problem as well while testing to 2.4.0 upgrade (from 2.2.4). We use centos 7 for airflow. SQLAlchemy - 1.4.27, Flask-AppBuilder 4.1.3. When I pulled DAGs I cannot open any of them on WebUI - I got that error: DAG seems to be missing from DagBag. Let me know if you need any additional info.

From what version did you start using airflow? Asking so I can try to upgrade from that version to 2.3.4 to 2.4.0

ephraimbuddy avatar Sep 27 '22 16:09 ephraimbuddy

@ashb We have experienced that problem as well while testing to 2.4.0 upgrade (from 2.2.4). We use centos 7 for airflow. SQLAlchemy - 1.4.27, Flask-AppBuilder 4.1.3. When I pulled DAGs I cannot open any of them on WebUI - I got that error: DAG seems to be missing from DagBag. Let me know if you need any additional info.

From what version did you start using airflow? Asking so I can try to upgrade from that version to 2.3.4 to 2.4.0

I think we started from version 1.10.14. But I saw some tests on version 1.10.12 - so I am not sure from which version our current database works - it can be 1.10.14 or 1.10.12

I tested it on new 2.2.4 deployment and upgraded it to 2.4.0. I have experienced the same problem.

sfc-gh-mkmak avatar Sep 27 '22 16:09 sfc-gh-mkmak

@ashb I attached examples of the two errors I saw in case it's helpful.

sterling-jackson avatar Sep 27 '22 17:09 sterling-jackson

We are seeing the same issue upgrading from 2.3.4 to 2.4.0

[SQL: INSERT INTO ab_view_menu (name) VALUES (%(name)s) RETURNING ab_view_menu.id]
[parameters: {'name': 'DAG:example_branch_datetime_operator_3'}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
ERROR [airflow.www.fab_security.sqla.manager] Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, 1, null).

[SQL: INSERT INTO ab_permission_view (permission_id, view_menu_id) VALUES (%(permission_id)s, %(view_menu_id)s) RETURNING ab_permission_view.id]
[parameters: {'permission_id': 1, 'view_menu_id': None}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
ERROR [airflow.www.fab_security.sqla.manager] Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, 4, 139).

troyharvey avatar Sep 29 '22 02:09 troyharvey

Can anyone give us reproduction steps? Cos trying this with a "minimal" 2.3.4 and the upgrading to 2.4.0 neither myself nor @ephraimbuddy have been able to reproduce this, so we're not sure what step we're missing.

(And until we can reproduce it, we can't fix it)

ashb avatar Sep 29 '22 13:09 ashb

@ashb try our upgrade path, may be that's the key. I have two servers, one started at 1.x and other started at 2.x. The 2.x server upgraded to 2.3.4 and 2.4 without issues, where as the 1.x server upgraded to 2.3.4 and is now failing 2.4 upgrade from 2.3.4

ManikandanUV avatar Sep 29 '22 13:09 ManikandanUV

Thanks for the tip @sterling-jackson. Your suggestion resolved this one for us. If you use this solution alter the START WITH based on the max id in your tables.

ALTER TABLE public.ab_view_menu ALTER id ADD GENERATED ALWAYS AS IDENTITY (START WITH 300);
ALTER TABLE public.ab_permission_view ALTER id ADD GENERATED ALWAYS AS IDENTITY (START WITH 4000);
ALTER TABLE public.ab_permission_view_role ALTER id ADD GENERATED ALWAYS AS IDENTITY (START WITH 3000);

troyharvey avatar Sep 29 '22 14:09 troyharvey

Piling on here. I'm seeing the same thing. Seems to maybe be related Datasets?

[2022-09-29T09:55:10.909-0400] {manager.py:420} ERROR - Add View Menu Error: (psycopg2.errors.NotNullViolation) null value in column "id" of relation "ab_view_menu" violates not-null constraint
DETAIL:  Failing row contains (null, Datasets).

We started on version 1.10.xx too and have been upgrading with each release.

Edit: I see now, there is a Datasets views that is added.

JCoder01 avatar Sep 29 '22 14:09 JCoder01

We started with 1.8.xx, went to 1.9.xx, 1.10.xx, and somehow all of our FAB tables ended up without sequences set for their IDs, but had the sequences created. We were seeing similar issues in 2.4.0, and manually ran:

ALTER TABLE "public"."ab_permission_view" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_id_seq'::regclass);
ALTER TABLE "public"."ab_permission" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_id_seq'::regclass);
ALTER TABLE "public"."ab_permission_view_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_role_id_seq'::regclass);
ALTER TABLE "public"."ab_register_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_register_user_id_seq'::regclass);
ALTER TABLE "public"."ab_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_role_id_seq'::regclass);
ALTER TABLE "public"."ab_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_id_seq'::regclass);
ALTER TABLE "public"."ab_user_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_role_id_seq'::regclass);
ALTER TABLE "public"."ab_view_menu" ALTER COLUMN "id" SET DEFAULT nextval('ab_view_menu_id_seq'::regclass);

Which resolved our issue.

joshowen avatar Sep 29 '22 14:09 joshowen

Thanks @joshowen that's very helpful

ashb avatar Sep 29 '22 15:09 ashb

ALTER TABLE "public"."ab_permission_view_role" ALTER COLUMN "id" SET DEFAULT ALTER TABLE "public"."ab_view_menu" ALTER COLUMN "id" SET DEFAULT nextval('ab_view_menu_id_seq'::regclass);

We have the same problem after upgrading from 2.3.4 to 2.4.1

Thanks @joshowen, these commands resolved the problem.

stepanof avatar Oct 04 '22 14:10 stepanof

Okay, I've found the source of the confusion, and the path needed to trigger this behaviour.

Run airflow webserver with < 1.10.13 in RBAC mode, where FAB creates it's tables.

In 1.10.13 we introduces a migration that creates the tables with the server_default but that migration only did anything if the tables didn't already exist. But the tables created by the FAB model have a default (but not a server_default).

Oh, and the final bit of the puzzle, in 2.4 we finally "took control" of the FAB security models in to airflow and those do not have the default set.

I'll work on a new migration to fix this up.

ashb avatar Oct 05 '22 11:10 ashb

Ran into the same issue. We also started in 2017 which should be around version 1.8 something and have upgraded since then. I ran the commands @joshowen posted and now I can open the DAG again.

I am getting a airflow dags reserialize error though (DAGs already exist). Is it safe to delete from the serialize table and then reserialize again? It appears like everything is working fine as is, the only error is when I run that command in a container.

ldacey avatar Oct 07 '22 17:10 ldacey

not until a new user couldn't register did i realized i was having this issue and a ton of such error messages in the logs :joy:

zachliu avatar Oct 18 '22 16:10 zachliu

We started with 1.8.xx, went to 1.9.xx, 1.10.xx, and somehow all of our FAB tables ended up without sequences set for their IDs, but had the sequences created. We were seeing similar issues in 2.4.0, and manually ran:

ALTER TABLE "public"."ab_permission_view" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_id_seq'::regclass);
ALTER TABLE "public"."ab_permission" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_id_seq'::regclass);
ALTER TABLE "public"."ab_permission_view_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_role_id_seq'::regclass);
ALTER TABLE "public"."ab_register_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_register_user_id_seq'::regclass);
ALTER TABLE "public"."ab_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_role_id_seq'::regclass);
ALTER TABLE "public"."ab_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_id_seq'::regclass);
ALTER TABLE "public"."ab_user_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_role_id_seq'::regclass);
ALTER TABLE "public"."ab_view_menu" ALTER COLUMN "id" SET DEFAULT nextval('ab_view_menu_id_seq'::regclass);

Which resolved our issue.

We ran into this issue upgrading from 2.3.1 to 2.4.1 so it doesn't seem the issue is fixed yet. These table alterations resolved the problem though.

aabouelleil-seniorlink avatar Nov 17 '22 14:11 aabouelleil-seniorlink

We started with 1.8.xx, went to 1.9.xx, 1.10.xx, and somehow all of our FAB tables ended up without sequences set for their IDs, but had the sequences created. We were seeing similar issues in 2.4.0, and manually ran:

ALTER TABLE "public"."ab_permission_view" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_id_seq'::regclass);
ALTER TABLE "public"."ab_permission" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_id_seq'::regclass);
ALTER TABLE "public"."ab_permission_view_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_role_id_seq'::regclass);
ALTER TABLE "public"."ab_register_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_register_user_id_seq'::regclass);
ALTER TABLE "public"."ab_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_role_id_seq'::regclass);
ALTER TABLE "public"."ab_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_id_seq'::regclass);
ALTER TABLE "public"."ab_user_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_role_id_seq'::regclass);
ALTER TABLE "public"."ab_view_menu" ALTER COLUMN "id" SET DEFAULT nextval('ab_view_menu_id_seq'::regclass);

Which resolved our issue.

We ran into this issue upgrading from 2.3.1 to 2.4.1 so it doesn't seem the issue is fixed yet. These table alterations resolved the problem though.

@ashb These must be run before running 0073_2_0_0_prefix_dag_permissions.py

joshowen avatar Nov 28 '22 19:11 joshowen

Argh! Good catch

ashb avatar Nov 28 '22 20:11 ashb