gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

Fix gplogfilter csv generation

Open t1mursadykov opened this issue 4 years ago • 1 comments

The result of gplogfilter is ambiguously perceived by parsers. Fields in generated csv containing line breaks, double quotes, and csv delimeter doesn't enclosed in double-quotes. To fix this, the standard csv.writer class is used to generate csv.

gplogfilter output

2021-03-09 00:51:12.533081 MSK|tim|postgres|p43037|th1988012416|[local]||2021-03-09 00:51:00 MSK|0|con26|cmd1|seg-1||||sx1|LOG: |00000|statement: select
'a',
'|',
'"',
'b'
from generate_series(1,2);||||||select
'a',
'|',
'"',
'b'
from generate_series(1,2);|0||postgres.c|1636|
2021-03-09 00:51:18.239432 MSK|tim|postgres|p43037|th1988012416|[local]||2021-03-09 00:51:00 MSK|0|con26|cmd3|seg-1||||sx1|LOG: |00000|statement: select 'a' || '|c' || '"b';||||||select 'a' || '|c' || '"b';|0||postgres.c|1636|
2021-03-09 00:51:22.133374 MSK|tim|postgres|p43037|th1988012416|[local]||2021-03-09 00:51:00 MSK|0|con26|cmd5|seg-1||||sx1|LOG: |00000|statement: select 1 | 1 from generate_series(1, 2);||||||select 1 | 1 from generate_series(1, 2);|0||postgres.c|1636|

csv standart: https://www.rfc-editor.org/rfc/rfc4180.txt

Steps to reproduce

  1. run one of the queries
select
'a',
'|',
'"',
'b'
from generate_series(1,2);

select 'a' || '|c' || '"b';

select 1 | 1 from generate_series(1, 2);
  1. call gplogfilter and save result to some file
  2. try to import result csv to table or some csv viewer

Importing into gpdb:

create temp table test as select * from gp_toolkit.gp_log_system limit 0;
\copy test from /tmp/log.csv with (format csv, delimiter '|');

result:

ERROR:  invalid input syntax for type timestamp with time zone: "'a',"
CONTEXT:  COPY test, line 4, column logtime: "'a',"

t1mursadykov avatar Mar 08 '21 22:03 t1mursadykov

I am (personally) worried that this will break some awk script. like this

gplogfilter datadirs/**/**.csv 2>/dev/null | awk -F '|' '{print $19}' | grep statement | sed 's/statement://g'

I'm not sure if someone really has this usage, but backward compatibility means keeping your mistakes.

can you send this patch to master? gplogfilter will support py3 after #12373.

for CI, please rebase to 6X_STABLE, the error is not in your code.

Sasasu avatar Jul 28 '21 03:07 Sasasu

@t1mursadykov and @Sasasu Seems this PR stays for months, is it still actionable or OK to close if not ?

haolinw avatar Sep 28 '22 02:09 haolinw

I am closing this PR, the change has merged into gpdb 7 in https://github.com/greenplum-db/gpdb/commit/6603e4af57f878315942c50859003d1c123411be. for gpdb 6, I think this is a behavior break, if this fix is really needed, consider adding --strict-csv parameter or something like this.

Sasasu avatar Oct 10 '22 02:10 Sasasu