sqlfluff
sqlfluff copied to clipboard
Rule L003 and hanging indents
L003 errors when indenting an expression that breaks over multiple lines within a SELECT. It seems the underlying decision is whether a comma opens a new indent level or not
Example
-- yes
SELECT
foo
, SUM(
some_long_expression
)
-- no
SELECT
foo
, SUM(
some_long_expression
)
Argument: This indent style enhances readability so that an expression looks the same no matter where it is and enables us to scan vertically.
Reproduce by
import textwrap
from sqlfluff.core import Linter
from sqlfluff.core.config import FluffConfig
linter = Linter(
config=FluffConfig(
configs={
"rules": {
"comma_style": "leading",
"tab_space_size": 2,
},
}
)
)
linter.lint_string_wrapped(
textwrap.dedent(
"""\
SELECT
foo
, SUM(
some_long_expression
) AS example
FROM tbl
"""
)
).as_records()
produces
[{'filepath': '<string input>',
'violations': [{'line_no': 4,
'line_pos': 7,
'code': 'L003',
'description': 'Line over-indented compared to line #3'},
{'line_no': 5,
'line_pos': 5,
'code': 'L003',
'description': 'Indentation not consistent with line #3'}]}]
For reference, this SQL passes linting
linter.lint_string_wrapped(
textwrap.dedent(
"""\
SELECT
foo
, SUM(
some_long_expression
) AS example
FROM tbl
"""
)
).as_records()
@tunetheweb Any hints on where to get started if I wanted to try creating a PR for this?
Sorry missed this earlier. I'm not the most familiar with rule L003 but do know it's one of the more complicated (if not the most complicated!) ones.
@barrywhart any thoughts here?
Not really. I mean, the "hanging indent" code is identified with comments, so it wouldn't be difficult to locate.
The challenge I've found working on the rule is that there's often not a "separation of concerns" -- the code does a dozen different things, and often the same few lines of code affect several of these. The code has lots of tests, so I'm not too worried someone would break it, just be prepared to feel confused at first and spend more time on this than you may expect.
Yeah this rule is causing a bit of headache when dealing with long lines. As an example:
SELECT
field_one,
-- Shouldn't line continuations be indented?
min(event_tracking_activity_id)
OVER (PARTITION BY some_id_used_to_partition) = event_tracking_activity_id
AS is_first_val
FROM
my_table
I would think that line continuations, needed either for readability or to prevent the line from being too long, would be indented no? But this results in L003 | Indentation not consistent with line #..
So there's a development of this with #3942.
The first two examples are now both corrected to:
SELECT
foo
, SUM(
some_long_expression
)
The comma doesn't open a new indent. The logic laid out in the new docs sets out roughly why this is the case, but we no longer support hanging indention. In the given example, an indent between , and SUM is either taken or untaken. For it to be taken, we'd need there to be newline there.
Supporting an option where there's effectively a double indent could be possible, but I think it would be very tough. I suggest we get the new indent algorithms merged before we attempt to build that on top.
The other query posted here is a different issue and perhaps more solvable. That currently fixes to:
SELECT
field_one,
-- Shouldn't line continuations be indented?
min(event_tracking_activity_id)
OVER (PARTITION BY some_id_used_to_partition) = event_tracking_activity_id
AS is_first_val
FROM
my_table
This is because there's currently no indent segment between the first function call and the rest of the line. That would be fairly easy to add though. Given that's kind of a totally separate issue though - I'm going to add that as a new issue.
I'm not planning to address the first issue here in the current reflow push because I think it makes indentation dependent on comma position:
SELECT
foo,
-- single indent with trailing commas. This is default functionality.
-- (shown with the default 4 space indent)
SUM(
my_expression
);
SELECT
foo,
-- double indent with leading commas and an indent unit of 2 spaces.
-- This is what I think was requested by the OP.
, SUM(
my_expression
);
SELECT
foo,
-- double indent with leading commas. It looks strange with the default indent unit = 4 spaces.
, SUM(
my_expression
);
Given that I think the rationale for this only makes sense in the specific case of leading commas and an indent unit of 2 spaces, I think it's tricky to make it consistent across the board.
Hey @alanmcruickshank - thanks for looking at this one again. You've got a great point that the 2-space indent does look a lot cleaner. For completeness, with 4-space indents I'd imagine the spacing to look like this
SELECT
foo
-- double indent with leading commas
, SUM(
my_expression
)
, bar
FROM
Where the SUM's line continuation "resets" the indent base. I believe that lets humans quickly scan over items in the select (just scan down for more commas) and also into the SUM expression.
For comparison, this has an analogy to suqueries
-- indent=4-spaces
SELECT
...
FROM (
SELECT ...
)
Ok - so I think the only additional feature we'd need to achieve this is allowing an optional additional half indent (i.e. two spaces more than usual) for closing brackets within comma separated lists.
That isn't a huge deviation for the planned logic. I'm not planning on addressing it straight away, but I can imagine it being a configurable edge case we could support.