cli icon indicating copy to clipboard operation
cli copied to clipboard

Enable `COPY` for seed and migrations

Open ethicnology opened this issue 1 year ago • 4 comments

Describe the bug INSERT INTO is slow for data dump in comparison to COPY when you need to scale, COPY is also the default export with pg_dump for postgres.

Migrations and seed.sql that contains COPY statement fails.

To Reproduce Steps to reproduce the behavior:

  1. Create a migration with the COPY statement to feed your DB
COPY auth.users (instance_id, id, aud, role, email, encrypted_password, email_confirmed_at, invited_at, confirmation_token, confirmation_sent_at, recovery_token, recovery_sent_at, email_change_token_new, email_change, email_change_sent_at, last_sign_in_at, raw_app_meta_data, raw_user_meta_data, is_super_admin, created_at, updated_at, phone, phone_confirmed_at, phone_change, phone_change_token, phone_change_sent_at, email_change_token_current, email_change_confirm_status, banned_until, reauthentication_token, reauthentication_sent_at, is_sso_user, deleted_at, is_anonymous) FROM stdin;
00000000-0000-0000-0000-000000000000	e3dc0b91-c49a-47f8-8421-79001580962b	authenticated	authenticated	[email protected]	$2a$10$HVoLo3KAEmuKkr.3hnARPOubNZOrDCxROmo31AX7rRNCOKsmPBcdq	2024-06-08 17:41:47.920181+00	\N		2024-06-08 17:41:23.570749+00		\N			\N	2024-06-08 17:42:23.982138+00	{"provider": "email", "providers": ["email"]}	{"sub": "e3dc0b91-c49a-47f8-8421-79001580962b", "email": "[email protected]", "email_verified": false, "phone_verified": false}	\N	2024-06-08 17:41:23.553288+00	2024-06-08 17:42:23.987962+00	\N	\N			\N		0	\N		\N	f	\N	f
\.
  1. supabase db reset

Expected behavior Successfull migration

Screenshots

Applying migration 20240827020350_my_migration.sql...
FATAL: terminating connection because protocol synchronization was lost (SQLSTATE 08P01)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
At statement 10: --                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
-- TOC entry 3955 (class 0 OID 29188)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
-- Dependencies: 287                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
-- Data for Name: users; Type: TABLE DATA; Schema: auth; Owner: supabase_auth_admin                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
--                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
COPY auth.users (instance_id, id, aud, role, email, encrypted_password, email_confirmed_at, invited_at, confirmation_token, confirmation_sent_at, recovery_token, recovery_sent_at, email_change_token_new, email_change, email_change_sent_at, last_sign_in_at, raw_app_meta_data, raw_user_meta_data, is_super_admin, created_at, updated_at, phone, phone_confirmed_at, phone_change, phone_change_token, phone_change_sent_at, email_change_token_current, email_change_confirm_status, banned_until, reauthentication_token, reauthentication_sent_at, is_sso_user, deleted_at, is_anonymous) FROM stdin

System information Rerun the failing command with --create-ticket flag.

  • Ticket ID: f06acc6824804d0a8044a1a8bd6d1108
  • Version of OS: MacOS 14.1.1
  • Version of CLI: 1.190.0
  • Version of Docker: Docker version 27.1.1, build 6312585
  • Versions of services: N/A

ethicnology avatar Aug 27 '24 02:08 ethicnology

Can you share the time it took for you and the size of database you are seeding? The default multiline insert statements emitted by supabase db dump is very fast for seeding small datasets (10k+ rows).

Unfortunately the syntax of COPY ... FROM STDIN; ... \. is specific to psql client. It will take some engineering effort to build it with pgx while handling all the edge cases around sql parsing. We could also explore using psql directly to support directory of fixtures https://github.com/supabase/cli/issues/2026.

That said, the current recommendation for seeding larger datasets is to use supabase db dump --use-copy to emit sql with copy statements, followed by running psql -f seed.sql separately.

sweatybridge avatar Aug 27 '24 03:08 sweatybridge

Understood, thank you for this quick answer 🥇

I'm seeding 400k+ rows the problem is when you export the data, the database using COPY statement is fast, while pg_dump --inserts is slower. It's also a bit less readable, IMO.

We can mix them, but it's less convenient than trigger a supabase db reset to be sure that everything works as intended rather than making an external custom script that combine it with psql.

PS: Is there a way to split the seed.sql across multiple files, actually I'm creating fixtures into migrations folder to achieve it.

ethicnology avatar Aug 27 '24 12:08 ethicnology

PS: Is there a way to split the seed.sql across multiple files, actually I'm creating fixtures into migrations folder to achieve it.

Hey ! We just merged https://github.com/supabase/cli/pull/2702 that should help with that.

avallete avatar Sep 27 '24 05:09 avallete

great thank you @avallete

ethicnology avatar Sep 27 '24 11:09 ethicnology