pg_query icon indicating copy to clipboard operation
pg_query copied to clipboard

`DO ...` statement is normalized as an unparsable string

Open fbriand-docto opened this issue 3 years ago • 2 comments

Hi,

We need to parse normalized queries but we have an issue when we .normalize a DO ... statement since 2.2.0:

irb(main):001:0> normalized_sql = PgQuery.normalize("DO $$BEGIN RAISE NOTICE 'hello'; END$$;")
=> "DO $1;"
irb(main):002:0> PgQuery.parse(normalized_sql)
/Users/flora.briand/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/pg_query-2.2.0/lib/pg_query/parse.rb:3:in `parse_protobuf': syntax error at or near "$1" (scan.l:1236) (PgQuery::ParseError)
	from /Users/flora.briand/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/pg_query-2.2.0/lib/pg_query/parse.rb:3:in `parse'
	from (irb):4:in `<main>'
	from /Users/flora.briand/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/irb-1.4.1/exe/irb:11:in `<top (required)>'
	from /Users/flora.briand/.rbenv/versions/3.1.2/bin/irb:25:in `load'
	from /Users/flora.briand/.rbenv/versions/3.1.2/bin/irb:25:in `<main>'

We did not have the issue in pg_query 2.1.4:

irb(main):001:0> normalized_sql = PgQuery.normalize("DO $$BEGIN RAISE NOTICE 'hello'; END$$;")
=> "DO $$BEGIN RAISE NOTICE 'hello'; END$$;"
irb(main):002:0> PgQuery.parse(normalized_sql)
=> #<PgQuery::ParserResult:0x0000000112e57b90
 @aliases=nil,
 @cte_names=nil,
 @functions=nil,
 @query="DO $$BEGIN RAISE NOTICE 'hello'; END$$;",
 @tables=nil,
 @tree=
  <PgQuery::ParseResult: version: 130003, stmts: [<PgQuery::RawStmt: stmt: <PgQuery::Node: do_stmt: <PgQuery::DoStmt: args: [<PgQuery::Node: def_elem: <PgQuery::DefElem: defnamespace: "", defname: "as", arg: <PgQuery::Node: string: <PgQuery::String: str: "BEGIN RAISE NOTICE 'hello'; END">>, defaction: :DEFELEM_UNSPEC, location: 3>>]>>, stmt_location: 0, stmt_len: 38>]>,
 @warnings=[]>

fbriand-docto avatar Nov 14 '22 16:11 fbriand-docto

@fbriand-docto Thanks for reaching out!

This was an intentional change in 2.2.0 to avoid data privacy-related issues with generated DO statements that contain personally identifiable information, such as produced by Sequelize (the Node.js ORM).

However I can see how the fact that this no longer parses is not ideal. I'll review how we can adjust the parser to support this case (we already have a parser patch for a few other cases).

Out of curiosity, what is your use case for pg_query, and do you typically work with DO statements a lot?

lfittl avatar Nov 14 '22 17:11 lfittl

@lfittl Thanks for your answer.

We use pg_query to analyze the queries, so for example to know if the query is a select or a write.

AFAIK we have 2 usages of the DO statement which are both used in the initialization of our tests. One to setup a read-only user and the other one to empty all tables.

fbriand-docto avatar Nov 15 '22 09:11 fbriand-docto