almanac.httparchive.org
almanac.httparchive.org copied to clipboard
Bump sqlfluff from 1.4.5 to 3.0.6 in /src
Bumps sqlfluff from 1.4.5 to 3.0.6.
Release notes
Sourced from sqlfluff's releases.
[3.0.6] - 2024-05-06
Highlights
This release primarily fixes an issue introduced by the recent dbt 1.7.14 release, and better support for dbt 1.7+. It also includes a range of dialect improvements and CLI refinements.
This release also includes the groundwork for linting the unrendered sections of Jinja templates. More documentation on this will be released in due course when it's ready for beta testing.
Thanks also to
@padraic00
&@burhanyasar
who made their first contributions in this release. 🎉🎉🏆🎉🎉What’s Changed
- [fix_clickhouse] Temporary Table Create AS SELECT (#5843)
@konnectr
- Bugfix: ST02 - Compare entire condition expression (#5850)
@WittierDinosaur
- Clichouse prewhere (#5849)
@konnectr
- BigQuery: Support missing DROP statements (#5848)
@kzosabe
- BigQuery: various CREATE statements (#5846)
@greg-finley
- BigQuery Alter Schema (#5835)
@greg-finley
- Snowflake execute immediate from (#5836)
@greg-finley
- Support dbt 1.7 (#5842)
@WittierDinosaur
- Postgres: Create extension cascade (#5834)
@greg-finley
- Postgres: Add Support for PostGIS operators (#5830)
@burhanyasar
- Db2: Support additional CREATE INDEX options (#5827)
@keraion
- Allow to align all siblings when respacing (#5826)
@borchero
- BigQuery: Support EXECUTE IMMEDIATE (#5820)
@keraion
- BigQuery: Support CREATE ROW ACCESS POLICY statement (#5821)
@kzosabe
- Fix Jinja variant location correction (#5814)
@alanmcruickshank
- Test cases for linter fails. (#5815)
@alanmcruickshank
- BigQuery: Support nested BEGIN, Fix CREATE PROCEDURE OPTIONS (#5816)
@keraion
- Bring multiple jinja variants through to the parser. (#5794)
@alanmcruickshank
- Fix placeholder labelling (#5813)
@alanmcruickshank
- Tighten up the return from .process() (#5810)
@alanmcruickshank
- BigQuery: Support CREATE MATERIALIZED VIEW AS REPLICA OF (#5811)
@kzosabe
- BigQuery: Support OPTIONS in CREATE FUNCTION statement (#5812)
@kzosabe
- TSQL: fix
ALTER TABLE ... SWITCH PARTITION
(#5807)@keen85
- SparkSQL: Add functions that use UNIT keywords (#5806)
@keraion
- CLI: Add
--stdin-filename
option (#5805)@keraion
- TSQL: parse
CREATE/ALTER/DROP MASTER KEY
(#5802)@keen85
- Jinja Variant Configuration (#5785)
@alanmcruickshank
- Small refactor in jinja templater (#5786)
@alanmcruickshank
- BigQuery: Support FOR SYSTEM_TIME AS OF in CREATE TABLE CLONE statement (#5798)
@kzosabe
- TSQL: support for
CREATE/ALTER PARTITION FUNCTION/SCHEME
(#5793)@keen85
- BigQuery: Support DEFAULT COLLATE segment (#5790)
@kzosabe
- TSQL: support computed columns (#5792)
@keen85
- Simplify one of the lexer methods (#5788)
@alanmcruickshank
- Improve light colour highlight (#5784)
@alanmcruickshank
- SparkSQL: Support TIMESTAMP_LTZ and TIMESTAMP_NTZ types (#5783)
@padraic00
New Contributors
@padraic00
made their first contribution in #5783
... (truncated)
Changelog
Sourced from sqlfluff's changelog.
[3.0.6] - 2024-05-06
Highlights
This release primarily fixes an issue introduced by the recent dbt 1.7.14 release, and better support for dbt 1.7+. It also includes a range of dialect improvements and CLI refinements.
This release also includes the groundwork for linting the unrendered sections of Jinja templates. More documentation on this will be released in due course when it's ready for beta testing.
Thanks also to
@padraic00
&@burhanyasar
who made their first contributions in this release. 🎉🎉🏆🎉🎉What’s Changed
- [fix_clickhouse] Temporary Table Create AS SELECT #5843
@konnectr
- Bugfix: ST02 - Compare entire condition expression #5850
@WittierDinosaur
- Clichouse prewhere #5849
@konnectr
- BigQuery: Support missing DROP statements #5848
@kzosabe
- BigQuery: various CREATE statements #5846
@greg-finley
- BigQuery Alter Schema #5835
@greg-finley
- Snowflake execute immediate from #5836
@greg-finley
- Support dbt 1.7 #5842
@WittierDinosaur
- Postgres: Create extension cascade #5834
@greg-finley
- Postgres: Add Support for PostGIS operators #5830
@burhanyasar
- Db2: Support additional CREATE INDEX options #5827
@keraion
- Allow to align all siblings when respacing #5826
@borchero
- BigQuery: Support EXECUTE IMMEDIATE #5820
@keraion
- BigQuery: Support CREATE ROW ACCESS POLICY statement #5821
@kzosabe
- Fix Jinja variant location correction #5814
@alanmcruickshank
- Test cases for linter fails. #5815
@alanmcruickshank
- BigQuery: Support nested BEGIN, Fix CREATE PROCEDURE OPTIONS #5816
@keraion
- Bring multiple jinja variants through to the parser. #5794
@alanmcruickshank
- Fix placeholder labelling #5813
@alanmcruickshank
- Tighten up the return from .process() #5810
@alanmcruickshank
- BigQuery: Support CREATE MATERIALIZED VIEW AS REPLICA OF #5811
@kzosabe
- BigQuery: Support OPTIONS in CREATE FUNCTION statement #5812
@kzosabe
- TSQL: fix
ALTER TABLE ... SWITCH PARTITION
#5807@keen85
- SparkSQL: Add functions that use UNIT keywords #5806
@keraion
- CLI: Add
--stdin-filename
option #5805@keraion
- TSQL: parse
CREATE/ALTER/DROP MASTER KEY
#5802@keen85
- Jinja Variant Configuration #5785
@alanmcruickshank
- Small refactor in jinja templater #5786
@alanmcruickshank
- BigQuery: Support FOR SYSTEM_TIME AS OF in CREATE TABLE CLONE statement #5798
@kzosabe
- TSQL: support for
CREATE/ALTER PARTITION FUNCTION/SCHEME
#5793@keen85
- BigQuery: Support DEFAULT COLLATE segment #5790
@kzosabe
- TSQL: support computed columns #5792
@keen85
... (truncated)
Commits
75cf648
Prep version 3.0.6 (#5853)4838e89
[fix_clickhouse] Temporary Table Create AS SELECT (#5843)99b09d4
Bugfix: ST02 - Compare entire condition expression (#5850)49d48a5
Clichouse prewhere (#5849)d48ded5
BigQuery: Support missing DROP statements (#5848)019922a
BigQuery: various CREATE statements (#5846)4f6760e
BigQuery Alter Schema (#5835)d2e448a
Snowflake execute immediate from (#5836)b326066
Support dbt 1.7 (#5842)baceed9
Postgres: Create extension cascade (#5834)- Additional commits viewable in compare view
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
-
@dependabot rebase
will rebase this PR -
@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it -
@dependabot merge
will merge this PR after your CI passes on it -
@dependabot squash and merge
will squash and merge this PR after your CI passes on it -
@dependabot cancel merge
will cancel a previously requested merge and block automerging -
@dependabot reopen
will reopen this PR if it is closed -
@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually -
@dependabot show <dependency name> ignore conditions
will show all of the ignore conditions of the specified dependency -
@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) -
@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) -
@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
See some SQL formatting example as a result of: $ sqlfluff fix ../sql/2022/privacy
. All makes sense to me.
There are a few issues in 2022 that require a manual fix though:
(.venv) maxostapenko@Maxs-MBP src % sqlfluff lint ../sql/2022
== [../sql/2022/caching/header_trends.sql] FAIL
L: 53 | P: 48 | AL08 | Reuse of column alias 'total_using_both' from line 42.
| [aliasing.unique.column]
L: 54 | P: 56 | AL08 | Reuse of column alias 'total_using_neither' from line
| 43. [aliasing.unique.column]
L: 71 | P: 59 | AL08 | Reuse of column alias 'pct_using_both' from line 60.
| [aliasing.unique.column]
L: 72 | P: 67 | AL08 | Reuse of column alias 'pct_using_neither' from line 61.
| [aliasing.unique.column]
== [../sql/2022/css/meta_unknown_properties.sql] FAIL
L: 51 | P: 45 | AL08 | Reuse of column alias 'total' from line 48.
| [aliasing.unique.column]
== [../sql/2022/media/video_source_formats.sql] FAIL
L: 11 | P: 5 | AL08 | Reuse of column alias 'video_source_format_type' from
| line 9. [aliasing.unique.column]
== [../sql/2022/performance/lcp_element_data.sql] FAIL
L: 112 | P: 15 | AL08 | Reuse of column alias 'total' from line 100.
| [aliasing.unique.column]
== [../sql/2022/security/feature_adoption_by_country.sql] FAIL
L: 28 | P: 116 | AL08 | Reuse of column alias 'pct_csp' from line 27.
| [aliasing.unique.column]
== [../sql/2022/seo/lighthouse_unused_css_js_by_rank.sql] FAIL
L: 31 | P: 5 | CV09 | Use of blocked regex
| '`httparchive.lighthouse.2022_07_01_*`'.
| [convention.blocked_words]
== [../sql/2022/seo/pages-canonical-stats.sql] FAIL
L: 167 | P: 5 | CV09 | Use of blocked regex '`httparchive.pages.2022_07_01_*`'.
| [convention.blocked_words]
== [../sql/2022/seo/robots-text-size.sql] FAIL
L: 40 | P: 5 | CV09 | Use of blocked regex '`httparchive.pages.2022_07_01_*`'.
| [convention.blocked_words]
All Finished 📜 🎉!
Should we maybe just ignore past years?
@rviscomi wasn't happy moving from this:
JOIN (
SELECT
_TABLE_SUFFIX AS client,
COUNT(0) AS total
FROM
`httparchive.summary_pages.2022_06_01_*`
GROUP BY
_TABLE_SUFFIX)
to this style which v2 and above insists upon:
JOIN (
SELECT
_TABLE_SUFFIX AS client,
COUNT(0) AS total
FROM
`httparchive.summary_pages.2022_06_01_*`
GROUP BY
_TABLE_SUFFIX
)
And I've had no luck so far getting the SQLFluff maintainers (of which I'm one btw, but don't do much with it these days) to support Rick's preferred bracketing, even optionally.
So that's a sticking point for the upgrade.
Other than that we'd need to update the noqa
comments to avoid above flagging and should probably do more changes to the config (as noted in this branch)
Thanks for the context. So the issue is formatting of subquery end bracket.
- I see the formatting you're talking about is raising error on
layout.indent
rule:
Expected line break and no indent before ')'.
This rule is critical for indentation formatting. But I don't see from it's description why is it taking care of newlines.
-
layout.cte_bracket
(another case of formatting the closing bracket) doesn't work as expected as well and has a conflict withlayout.indent
. See sqlfluff/sqlfluff/issues/5893. This conflict may impact the subqueries brackets too. @tunetheweb What do you think? -
Additionally we could introduce custom rule
layout.subquery_bracket
that would take care of this scenario. But again, the conflict with indentation needs to be resolved.
Yeah the layout code is complex (which is why I haven't gone and fixed it myself!). I think we need to add line_position
support to [sqlfluff:layout:type:end_bracket]
config.
But to be honest I don't disagree with SQLFluff and think I actually prefer it on a new line (like we do with CTEs).
@tunetheweb just to clarify, I actually prefer the closing parenthesis to be on its own line. What I said in https://github.com/HTTPArchive/almanac.httparchive.org/pull/3364#issuecomment-1474346015 was I wasn't (and still am not) a fan of changing from K&R:
FROM (
SELECT
_TABLE_SUFFIX AS client,
hasHeading(payload) AS has_heading
FROM
`httparchive.pages.2019_07_01_*`
)
to GNU:
FROM
(
SELECT
_TABLE_SUFFIX AS client,
hasHeading(payload) AS has_heading
FROM
`httparchive.pages.2019_07_01_*`
)
(Specifically, the extra indentation)
OK then I think we might be OK then as think that was made optional. Let me try and run fix and see what it looks like.
@rviscomi can you spot check a few files?
Will be good to get this upgraded as long been on my list to resolve this issue!
I would re-enable LT09 - Select targets on new lines.
While I agree with this for top-level queries, it breaks subqueries like this:
https://github.com/HTTPArchive/almanac.httparchive.org/blob/64f8794c9f8bd24de2f46e6b0a721403786430a0/sql/2019/cms/14_15.sql#L24
And IIRC we have a lot of those subqueries and separating them all out would make the code very verbose.
LT08 - Blank line separators for CTEs.
This looks OK to enable. We used to have a lot of these:
WITH cte1 AS (
SELECT a FROM table1
),
cte2 AS (
SELECT a FROM table2
)
SELECT * FROM cte1
And we didn't want the space between the CTEs but looks like most of them have spaces now and I added when adding the linter! So looks like we can enable this one.
Looked into LT09 again - what do you mean by 'breaks subqueries'?
I've found 4 files that will be definitely broken by this rule enabled, e.g. sql/2019/performance/07_03d.sql.
But in all other cases these inline subqueries are as if the query says 'no need to read this one, move on' Their FROMs are partly obscured (and surely WHEREs, if present) due to width.
Yeah "breaks" is a bit strong. And it is a lot more readable with it enforced:
FROM
`httparchive.pages.2019_07_01_*`
JOIN
- (SELECT _TABLE_SUFFIX, COUNT(0) AS total FROM `httparchive.pages.2019_07_01_*` GROUP BY _TABLE_SUFFIX)
+ (SELECT
+ _TABLE_SUFFIX,
+ COUNT(0) AS total
+ FROM `httparchive.pages.2019_07_01_*` GROUP BY _TABLE_SUFFIX)
USING (_TABLE_SUFFIX),
UNNEST(getElements(payload)) AS element
So maybe it is OK to enable this rule?
For the few we don't want this enabled for we can ignore it with -- noqa: disable=LT09
:
#standardSQL
# 07_03d: % fast FCP per PSI by geo
WITH geos AS ( -- noqa: disable=LT09
SELECT *, 'af' AS geo_code, 'Afghanistan' AS geo, 'Asia' AS region, 'Southern Asia' AS subregion FROM `chrome-ux-report.country_af.201907`
UNION ALL
Actually looking over the changes with sqlfluff fix
for this rule I'm not loving the changes to be honest. The fact it ONLY works on SELECT
targets and not the other clauses only half fixes the issues and doesn't really improve readability IMHO.
Examples:
FROM
`httparchive.almanac.requests`
JOIN (
- SELECT _TABLE_SUFFIX AS client, url AS page, app
+ SELECT
+ _TABLE_SUFFIX AS client,
+ url AS page,
+ app
FROM `httparchive.technologies.2022_06_01_*`
WHERE category = 'Ecommerce'
)
#standardSQL
# Top JS frameworks and libraries combinations
-SELECT
- *
+SELECT *
FROM (
SELECT
client,
third_party_domains AS (
- SELECT DISTINCT
- NET.HOST(domain) AS host
+ SELECT DISTINCT NET.HOST(domain) AS host
FROM
`httparchive.almanac.third_parties`
),
In fact in there's very few cases where I do think it improves things. Maybe this one cause it only has a FROM
clause (but that's very much the exception and most will have WHERE
clauses or GROUP BY
)?
FROM
- (SELECT _TABLE_SUFFIX AS client, url AS page, getCompliantElements(payload) AS compliant_elements FROM `httparchive.pages.2019_07_01_*`)
+ (SELECT
+ _TABLE_SUFFIX AS client,
+ url AS page,
+ getCompliantElements(payload) AS compliant_elements
+ FROM `httparchive.pages.2019_07_01_*`)
JOIN
Regarding indentation discussion, here's a config to get to the expected result:
[sqlfluff:layout:type:start_bracket]
spacing_before = single:inline
If seems OK, I'll push the changes and we'll have a fresh look.
And while I agree with @tunetheweb that inline opening bracket seems strange in expressions, I don't know a way to distinguish between expressions and subqueries. Moving opening bracket inline also decreases the overall amount of indents (GNU -> K&R).
So we'll get:
sql/2019/media/04_09c.sql b/sql/2019/media/04_09c.sql
- `httparchive.almanac.summary_response_bodies`
- WHERE
- date = '2019-07-01' AND
- firstHtml AND
- (
- REGEXP_CONTAINS(body, r'(?im)<meta[^><]*Accept-CH\b') OR
- REGEXP_CONTAINS(respOtherHeaders, r'(?im)Accept-CH = ')
- )
+ `httparchive.almanac.summary_response_bodies`
+ WHERE
+ date = '2019-07-01' AND
+ firstHtml AND (
+ REGEXP_CONTAINS(body, r'(?im)<meta[^><]*Accept-CH\b') OR
+ REGEXP_CONTAINS(respOtherHeaders, r'(?im)Accept-CH = ')
+ )
Some more examples (a lot of things go inline): 1.
2019/accessibility/09_02.sql b/sql/2019/accessibility/09_02.sql
-FROM
- (SELECT _TABLE_SUFFIX AS client, url AS page, getMainCount(payload) AS main_elements FROM `httparchive.pages.2019_07_01_*`)
-JOIN
- (SELECT client, page, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(body, '(?i)role=[\'"]?main')) AS main_roles FROM `httparchive.almanac.summary_response_bodies` WHERE date = '2019-07-01' AND firstHtml)
-USING
- (client, page)
-JOIN
- (SELECT _TABLE_SUFFIX AS client, COUNT(0) AS total FROM `httparchive.pages.2019_07_01_*` GROUP BY _TABLE_SUFFIX)
+FROM (SELECT _TABLE_SUFFIX AS client, url AS page, getMainCount(payload) AS main_elements FROM `httparchive.pages.2019_07_01_*`)
+JOIN (SELECT client, page, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(body, '(?i)role=[\'"]?main')) AS main_roles FROM `httparchive.almanac.summary_response_bodies` WHERE date = '2019-07-01' AND firstHtml)
+USING (client, page)
+JOIN (SELECT _TABLE_SUFFIX AS client, COUNT(0) AS total FROM `httparchive.pages.2019_07_01_*` GROUP BY _TABLE_SUFFIX)
2020/markup/pages_element_count_by_device_and_custom_dash_elements.sql
-CREATE TEMP FUNCTION AS_PERCENT(freq FLOAT64, total FLOAT64) RETURNS FLOAT64 AS (
- ROUND(SAFE_DIVIDE(freq, total), 4)
+CREATE TEMP FUNCTION AS_PERCENT(freq FLOAT64, total FLOAT64) RETURNS FLOAT64 AS (ROUND(SAFE_DIVIDE(freq, total), 4)
2019/performance/07_22.sql
SELECT
- _TABLE_SUFFIX AS client,
- (
+ _TABLE_SUFFIX AS client, (
CAST(IFNULL(JSON_EXTRACT(payload, "$['_cpu.Paint']"), '0') AS INT64) +
CAST(IFNULL(JSON_EXTRACT(payload, "$['_cpu.UpdateLayerTree']"), '0') AS INT64)
) AS paint_cpu_time
2020/markup/pages_markup_by_device.sql
-FROM
- (
- SELECT
- _TABLE_SUFFIX AS client,
- get_markup_info(JSON_EXTRACT_SCALAR(payload, '$._markup')) AS markup_info
- FROM
- `httparchive.pages.2020_08_01_*`
- )
+FROM (
+ SELECT
+ _TABLE_SUFFIX AS client,
+ get_markup_info(JSON_EXTRACT_SCALAR(payload, '$._markup')) AS markup_info
+ FROM
+ `httparchive.pages.2020_08_01_*`
+)
A newer version of sqlfluff exists, but since this PR has been edited by someone other than Dependabot I haven't updated it. You'll get a PR for the updated version as normal once this PR is merged.
Pushed all the file updates corresponding to new indents. @tunetheweb @rviscomi thoughts?