squawk
squawk copied to clipboard
Ignore specific query
I'm integrating Squawk into a CI pipeline and want to be able to tell it to ignore specific queries, similar to
#[allow(clippy::some_rule)]
// some code that would fail
So we'd have something like
/* Trust me, I'm an expert */
-- squawk: nolint
ALTER TABLE foo ALTER COLUMN bar SET NOT NULL;
I couldn't find a way to do that, if there one? And if not, would you be open to a PR adding that?
yeah squawk doesn't have a way to do that currently, I'm open to the feature
I think it might be tricky to add if the AST doesn't include comments, might need to rework things a bit
Hmm yeah - I see.
It looks like libpg_query v2 exposes pg_query_scan() and pg_query_split_with_xxx(). It might be doable with some combination of those before the parse stage 🤔
Any updates on this? This is a very useful feature
@chdsbd I took a stab at using that, but the CommentStmt is for an actual SQL COMMENT ON https://www.postgresql.org/docs/current/sql-comment.html
The downside of using this is that it actually changes the DB schema and you can only have one comment per object, so it would overwrite any existing comment on the object.
Hopefully there is a way to get the actual code comment i.e. -- I am an SQL comment
@adamrdavid That's too bad that doesn't work.
I feel like we need to do something ourselves to parse our comments vs SQL.
It looks like the parser gives you locations in the original SQL string.
Take a look at the parser schema that was linked earlier: https://github.com/pganalyze/libpg_query/blob/d1d0186b80c1fbf496265789af89da4e7ca890ab/protobuf/pg_query.proto Search the file for the word "location" - there are tons of hits! So that's the starting clue to figure out where in the original string the linter violation was found.
The next tool could be to look at the the lexer. Here you can see from the example that it clearly shows a CComment token with locations that correlate to a SQL comment in the query: https://docs.rs/pg_query/latest/pg_query/fn.scan.html --- so even though you can't get the code comments in the parse tree, you can still get it from the lexer.
Ideally the pg_query library would have an option for passing the output of the scan function to the parse function, so you can inspect the lexer output while also using it to parse. Instead I guess you have to pass the SQL string to the parser as well, which presumably would scan it a second time, and thus the lexer will run twice. Oh well. A worthwhile price to pay for this feature IMHO, at least until the API for pg_query could be improved in this area. I'm only going to use Squawk in CI/CD to scan a few KB of migrations that are about to be run; I'm not going to scan all my historical migrations & thus I do not care if Squawk runs slower as a result.
Putting it all together:
- Run the scanner to get a list of tokens. Build a mapping of file line number to subset of the code comment token(s) on that line, or something like that. (need a way to efficiently locate neighboring lines from the lexer when a linting violation is found while traversing the parsed AST).
- When a lint violation is found while examining the AST, get the location integer from the parse tree.
- Convert that location to a line number. Look up the same line or prior line using the mapping built in step 1 and see if there was any relevant comment containing a linter directive.
Can we just pre-process the file and strip out any SQL statements that are directly preceded by a -- squawk: nolint comment?
@bmorganpa I think that would work. SQL statements are delimited with semi-colons, so that could help with parsing too.
I'm open to PRs supporting this feature
For right now we just added a wrapper script around it to handle git diffs and just added a sed command to trim statements with squawk:ignore-next-statement using: sed '/squawk:ignore-next-statement/,/;$/d'