squawk icon indicating copy to clipboard operation
squawk copied to clipboard

Ignore specific query

Open ccakes opened this issue 4 years ago • 11 comments

I'm integrating Squawk into a CI pipeline and want to be able to tell it to ignore specific queries, similar to

#[allow(clippy::some_rule)]
// some code that would fail

So we'd have something like

/* Trust me, I'm an expert */
-- squawk: nolint
ALTER TABLE foo ALTER COLUMN bar SET NOT NULL;

I couldn't find a way to do that, if there one? And if not, would you be open to a PR adding that?

ccakes avatar Jun 15 '21 04:06 ccakes

yeah squawk doesn't have a way to do that currently, I'm open to the feature

I think it might be tricky to add if the AST doesn't include comments, might need to rework things a bit

sbdchd avatar Jun 15 '21 13:06 sbdchd

Hmm yeah - I see.

It looks like libpg_query v2 exposes pg_query_scan() and pg_query_split_with_xxx(). It might be doable with some combination of those before the parse stage 🤔

ccakes avatar Jun 16 '21 06:06 ccakes

Any updates on this? This is a very useful feature

saurabhbhalla90 avatar Feb 15 '22 00:02 saurabhbhalla90

It seems like pg_query.rs has a CommentStmt, which might work.

Here's the related pg_query.proto.

chdsbd avatar Feb 15 '22 01:02 chdsbd

@chdsbd I took a stab at using that, but the CommentStmt is for an actual SQL COMMENT ON https://www.postgresql.org/docs/current/sql-comment.html

The downside of using this is that it actually changes the DB schema and you can only have one comment per object, so it would overwrite any existing comment on the object.

Hopefully there is a way to get the actual code comment i.e. -- I am an SQL comment

adamrdavid avatar Mar 15 '22 21:03 adamrdavid

@adamrdavid That's too bad that doesn't work.

I feel like we need to do something ourselves to parse our comments vs SQL.

chdsbd avatar Mar 15 '22 23:03 chdsbd

It looks like the parser gives you locations in the original SQL string.

Take a look at the parser schema that was linked earlier: https://github.com/pganalyze/libpg_query/blob/d1d0186b80c1fbf496265789af89da4e7ca890ab/protobuf/pg_query.proto Search the file for the word "location" - there are tons of hits! So that's the starting clue to figure out where in the original string the linter violation was found.

The next tool could be to look at the the lexer. Here you can see from the example that it clearly shows a CComment token with locations that correlate to a SQL comment in the query: https://docs.rs/pg_query/latest/pg_query/fn.scan.html --- so even though you can't get the code comments in the parse tree, you can still get it from the lexer.

Ideally the pg_query library would have an option for passing the output of the scan function to the parse function, so you can inspect the lexer output while also using it to parse. Instead I guess you have to pass the SQL string to the parser as well, which presumably would scan it a second time, and thus the lexer will run twice. Oh well. A worthwhile price to pay for this feature IMHO, at least until the API for pg_query could be improved in this area. I'm only going to use Squawk in CI/CD to scan a few KB of migrations that are about to be run; I'm not going to scan all my historical migrations & thus I do not care if Squawk runs slower as a result.

Putting it all together:

  1. Run the scanner to get a list of tokens. Build a mapping of file line number to subset of the code comment token(s) on that line, or something like that. (need a way to efficiently locate neighboring lines from the lexer when a linting violation is found while traversing the parsed AST).
  2. When a lint violation is found while examining the AST, get the location integer from the parse tree.
  3. Convert that location to a line number. Look up the same line or prior line using the mapping built in step 1 and see if there was any relevant comment containing a linter directive.

Can we just pre-process the file and strip out any SQL statements that are directly preceded by a -- squawk: nolint comment?

bmorganpa avatar Jan 05 '24 16:01 bmorganpa

@bmorganpa I think that would work. SQL statements are delimited with semi-colons, so that could help with parsing too.

I'm open to PRs supporting this feature

chdsbd avatar Jan 12 '24 04:01 chdsbd

For right now we just added a wrapper script around it to handle git diffs and just added a sed command to trim statements with squawk:ignore-next-statement using: sed '/squawk:ignore-next-statement/,/;$/d'

bmorganpa avatar Jan 12 '24 18:01 bmorganpa