pg_sample
pg_sample copied to clipboard
Using --schema flag for having dump schema level granularity and make COPY less error prone
Using --schema flag for having dump schema level granularity improved per schema filtering, make COPY less error prone changed DELIMITER, QUOTE and ESCAPE for better handling of complicated data type fields (JSON in particular) added explicitly the table collumns listed in COPY FROM to handle situation when the order of collumns differs between source and destination databases added flag --password to explicitly ask for password when pg_dump needed (i.e each run without --data-only flag) commented out standard_conforming_strings = off so this will be set to default value, which is on and is needed for import with \t delimiter introduced the requirement that --sample_schema if specified must include _pg_sample in name to avoid risk of accidentally delete when used with --force
@mla I am not much into Perl. Any kind of feedback will be welcome ✊
These look like great changes, @andilabs. Why was --password option added to pg_dump?
I saw your comment on the change. Just wondering if it's actually necessary since pg_dump says:
--password
Force pg_dump to prompt for a password before connecting to a
database.
This option is never essential, since pg_dump will automatically
prompt for a password if the server demands password
authentication. However, pg_dump will waste a connection attempt
finding out that the server wants a password. In some cases it is
worth typing -W to avoid the extra connection attempt.
I find out this flag necessary - at least for my production db. I was also wondering when I red in the docs that it isn’t needed 🤷🏼♂️
On Wed, 5 Feb 2020 at 17:31, mla [email protected] wrote:
I saw your comment on the change. Just wondering if it's actually necessary since pg_dump says:
--password Force pg_dump to prompt for a password before connecting to a database.
This option is never essential, since pg_dump will automatically prompt for a password if the server demands password authentication. However, pg_dump will waste a connection attempt finding out that the server wants a password. In some cases it is worth typing -W to avoid the extra connection attempt.— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mla/pg_sample/pull/14?email_source=notifications&email_token=AAHS6N3E3CUFY5UDFI3TQLTRBLSWZA5CNFSM4KOFEN62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK4CH7I#issuecomment-582493181, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHS6N3RX62IHMSF3P76XM3RBLSWZANCNFSM4KOFEN6Q .
-- Pozdrowienia, Andrzej
Hi @mla how are you doing in coronavirus times? I hope you are doing well! What about merging this PR?
Cheers! Andi
Hey @andilabs. Doing okay, thanks. Hope you are too. Yes, sorry for the delay on this. Will target this weekend.
@andilabs can you explain what issue you hit with the COPY being error prone? Or even better, an example test case?
From the project dir, you should be able to run: prove -v to run the existing tests.
On the standard_conforming_strings setting, I see you commented it out with a comment that we'll use the Pg default. But isn't it more robust to know what the setting is? I don't particularly care if setting it on or off is better, but seems like we should try to minimize any configuration differences between environments.
I've working in the dev branch and working through your changes if you want to see the current state. The splitting of schema and table name shouldn't be necessary. The values in @table are Table instances, so we should be able to inspect the schema() and table() accessors directly.
If you could just explain what the string_agg stuff and adding CSV DELIMITER specifications and such to the COPY so I can better understand what problem was being triggered.
Thanks!
Also, I would rather we not add the --password option to the pg commands. The pg_dump docs state, for example:
-W
--password
Force pg_dump to prompt for a password before connecting to a
database.
This option is never essential, since pg_dump will automatically
prompt for a password if the server demands password
authentication. However, pg_dump will waste a connection attempt
finding out that the server wants a password. In some cases it is
worth typing -W to avoid the extra connection attempt.
So what is happening for you? It is not prompting you for a password?
We already support the connection params as part of the main script, so prob those should just be propagated to the pg_dump call if supplied.
I guess there's no great way to pass the password securely. There is a PGPASSWORD env var but it's not recommended to use.
There's a PGPASSFILE var that we could use but seems like a bit of a pain. We could create a correctly formatted pgpass file and point to that. https://www.postgresql.org/docs/9.6/libpq-pgpass.html
Or we could pass the --password option to pg_dump if it was supplied to pg_sample. But I thought that would be triggered automatically.
- ok let's get rid of password opt-out good practice of using
pgpass - the case for more error-prone COPY was connected to dealing with nested JSONB fields escaping. It's a bit pitty task for me to find examples now, because even if I find it, then I will have to clean/fake/ those data being confidential production system data.
- I added some comments in MR too.
- regarding
The values in @table are Table instances, so we should be able to inspect the schema() and table() accessors directly.excuses this. I might be miss using perl because of lacking experience in it.
@mla Will this one get merged?
@mla Will this one get merged?
What features from this are you looking for? Are you hitting a problem?