gpdb
gpdb copied to clipboard
Fix gplogfilter csv generation
The result of gplogfilter is ambiguously perceived by parsers. Fields in generated csv containing line breaks, double quotes, and csv delimeter doesn't enclosed in double-quotes. To fix this, the standard csv.writer class is used to generate csv.
gplogfilter output
2021-03-09 00:51:12.533081 MSK|tim|postgres|p43037|th1988012416|[local]||2021-03-09 00:51:00 MSK|0|con26|cmd1|seg-1||||sx1|LOG: |00000|statement: select
'a',
'|',
'"',
'b'
from generate_series(1,2);||||||select
'a',
'|',
'"',
'b'
from generate_series(1,2);|0||postgres.c|1636|
2021-03-09 00:51:18.239432 MSK|tim|postgres|p43037|th1988012416|[local]||2021-03-09 00:51:00 MSK|0|con26|cmd3|seg-1||||sx1|LOG: |00000|statement: select 'a' || '|c' || '"b';||||||select 'a' || '|c' || '"b';|0||postgres.c|1636|
2021-03-09 00:51:22.133374 MSK|tim|postgres|p43037|th1988012416|[local]||2021-03-09 00:51:00 MSK|0|con26|cmd5|seg-1||||sx1|LOG: |00000|statement: select 1 | 1 from generate_series(1, 2);||||||select 1 | 1 from generate_series(1, 2);|0||postgres.c|1636|
csv standart: https://www.rfc-editor.org/rfc/rfc4180.txt
Steps to reproduce
- run one of the queries
select
'a',
'|',
'"',
'b'
from generate_series(1,2);
select 'a' || '|c' || '"b';
select 1 | 1 from generate_series(1, 2);
- call
gplogfilterand save result to some file - try to import result csv to table or some csv viewer
Importing into gpdb:
create temp table test as select * from gp_toolkit.gp_log_system limit 0;
\copy test from /tmp/log.csv with (format csv, delimiter '|');
result:
ERROR: invalid input syntax for type timestamp with time zone: "'a',"
CONTEXT: COPY test, line 4, column logtime: "'a',"
I am (personally) worried that this will break some awk script. like this
gplogfilter datadirs/**/**.csv 2>/dev/null | awk -F '|' '{print $19}' | grep statement | sed 's/statement://g'
I'm not sure if someone really has this usage, but backward compatibility means keeping your mistakes.
can you send this patch to master? gplogfilter will support py3 after #12373.
for CI, please rebase to 6X_STABLE, the error is not in your code.
@t1mursadykov and @Sasasu Seems this PR stays for months, is it still actionable or OK to close if not ?
I am closing this PR, the change has merged into gpdb 7 in https://github.com/greenplum-db/gpdb/commit/6603e4af57f878315942c50859003d1c123411be.
for gpdb 6, I think this is a behavior break, if this fix is really needed, consider adding --strict-csv parameter or something like this.