pg_sample icon indicating copy to clipboard operation
pg_sample copied to clipboard

Using --schema flag for having dump schema level granularity and make COPY less error prone

Open andilabs opened this issue 5 years ago • 12 comments

Using --schema flag for having dump schema level granularity improved per schema filtering, make COPY less error prone changed DELIMITER, QUOTE and ESCAPE for better handling of complicated data type fields (JSON in particular) added explicitly the table collumns listed in COPY FROM to handle situation when the order of collumns differs between source and destination databases added flag --password to explicitly ask for password when pg_dump needed (i.e each run without --data-only flag) commented out standard_conforming_strings = off so this will be set to default value, which is on and is needed for import with \t delimiter introduced the requirement that --sample_schema if specified must include _pg_sample in name to avoid risk of accidentally delete when used with --force

andilabs avatar Jan 31 '20 10:01 andilabs

@mla I am not much into Perl. Any kind of feedback will be welcome ✊

andilabs avatar Feb 01 '20 14:02 andilabs

These look like great changes, @andilabs. Why was --password option added to pg_dump?

mla avatar Feb 05 '20 16:02 mla

I saw your comment on the change. Just wondering if it's actually necessary since pg_dump says:

--password
           Force pg_dump to prompt for a password before connecting to a
           database.

           This option is never essential, since pg_dump will automatically
           prompt for a password if the server demands password
           authentication. However, pg_dump will waste a connection attempt
           finding out that the server wants a password. In some cases it is
           worth typing -W to avoid the extra connection attempt.

mla avatar Feb 05 '20 16:02 mla

I find out this flag necessary - at least for my production db. I was also wondering when I red in the docs that it isn’t needed 🤷🏼‍♂️

On Wed, 5 Feb 2020 at 17:31, mla [email protected] wrote:

I saw your comment on the change. Just wondering if it's actually necessary since pg_dump says:

--password Force pg_dump to prompt for a password before connecting to a database.

       This option is never essential, since pg_dump will automatically
       prompt for a password if the server demands password
       authentication. However, pg_dump will waste a connection attempt
       finding out that the server wants a password. In some cases it is
       worth typing -W to avoid the extra connection attempt.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mla/pg_sample/pull/14?email_source=notifications&email_token=AAHS6N3E3CUFY5UDFI3TQLTRBLSWZA5CNFSM4KOFEN62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK4CH7I#issuecomment-582493181, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHS6N3RX62IHMSF3P76XM3RBLSWZANCNFSM4KOFEN6Q .

-- Pozdrowienia, Andrzej

andilabs avatar Feb 05 '20 19:02 andilabs

Hi @mla how are you doing in coronavirus times? I hope you are doing well! What about merging this PR?

Cheers! Andi

andilabs avatar Apr 17 '20 08:04 andilabs

Hey @andilabs. Doing okay, thanks. Hope you are too. Yes, sorry for the delay on this. Will target this weekend.

mla avatar Apr 17 '20 15:04 mla

@andilabs can you explain what issue you hit with the COPY being error prone? Or even better, an example test case?

From the project dir, you should be able to run: prove -v to run the existing tests.

On the standard_conforming_strings setting, I see you commented it out with a comment that we'll use the Pg default. But isn't it more robust to know what the setting is? I don't particularly care if setting it on or off is better, but seems like we should try to minimize any configuration differences between environments.

I've working in the dev branch and working through your changes if you want to see the current state. The splitting of schema and table name shouldn't be necessary. The values in @table are Table instances, so we should be able to inspect the schema() and table() accessors directly.

If you could just explain what the string_agg stuff and adding CSV DELIMITER specifications and such to the COPY so I can better understand what problem was being triggered.

Thanks!

mla avatar Apr 18 '20 18:04 mla

Also, I would rather we not add the --password option to the pg commands. The pg_dump docs state, for example:

-W
       --password
           Force pg_dump to prompt for a password before connecting to a
           database.

           This option is never essential, since pg_dump will automatically
           prompt for a password if the server demands password
           authentication. However, pg_dump will waste a connection attempt
           finding out that the server wants a password. In some cases it is
           worth typing -W to avoid the extra connection attempt.

So what is happening for you? It is not prompting you for a password?

We already support the connection params as part of the main script, so prob those should just be propagated to the pg_dump call if supplied.

mla avatar Apr 19 '20 16:04 mla

I guess there's no great way to pass the password securely. There is a PGPASSWORD env var but it's not recommended to use.

There's a PGPASSFILE var that we could use but seems like a bit of a pain. We could create a correctly formatted pgpass file and point to that. https://www.postgresql.org/docs/9.6/libpq-pgpass.html

Or we could pass the --password option to pg_dump if it was supplied to pg_sample. But I thought that would be triggered automatically.

mla avatar Apr 19 '20 16:04 mla

  • ok let's get rid of password opt-out good practice of using pgpass
  • the case for more error-prone COPY was connected to dealing with nested JSONB fields escaping. It's a bit pitty task for me to find examples now, because even if I find it, then I will have to clean/fake/ those data being confidential production system data.
  • I added some comments in MR too.
  • regarding The values in @table are Table instances, so we should be able to inspect the schema() and table() accessors directly. excuses this. I might be miss using perl because of lacking experience in it.

andilabs avatar Apr 30 '20 16:04 andilabs

@mla Will this one get merged?

justin808 avatar Oct 22 '20 07:10 justin808

@mla Will this one get merged?

What features from this are you looking for? Are you hitting a problem?

mla avatar Oct 23 '20 16:10 mla