almanac.httparchive.org icon indicating copy to clipboard operation
almanac.httparchive.org copied to clipboard

Bump sqlfluff from 1.4.5 to 3.0.6 in /src

Open dependabot[bot] opened this issue 9 months ago • 14 comments

Bumps sqlfluff from 1.4.5 to 3.0.6.

Release notes

Sourced from sqlfluff's releases.

[3.0.6] - 2024-05-06

Highlights

This release primarily fixes an issue introduced by the recent dbt 1.7.14 release, and better support for dbt 1.7+. It also includes a range of dialect improvements and CLI refinements.

This release also includes the groundwork for linting the unrendered sections of Jinja templates. More documentation on this will be released in due course when it's ready for beta testing.

Thanks also to @​padraic00 & @​burhanyasar who made their first contributions in this release. 🎉🎉🏆🎉🎉

What’s Changed

New Contributors

... (truncated)

Changelog

Sourced from sqlfluff's changelog.

[3.0.6] - 2024-05-06

Highlights

This release primarily fixes an issue introduced by the recent dbt 1.7.14 release, and better support for dbt 1.7+. It also includes a range of dialect improvements and CLI refinements.

This release also includes the groundwork for linting the unrendered sections of Jinja templates. More documentation on this will be released in due course when it's ready for beta testing.

Thanks also to @​padraic00 & @​burhanyasar who made their first contributions in this release. 🎉🎉🏆🎉🎉

What’s Changed

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependabot[bot] avatar May 07 '24 12:05 dependabot[bot]

See some SQL formatting example as a result of: $ sqlfluff fix ../sql/2022/privacy. All makes sense to me.

There are a few issues in 2022 that require a manual fix though:

(.venv) maxostapenko@Maxs-MBP src % sqlfluff lint ../sql/2022 
== [../sql/2022/caching/header_trends.sql] FAIL                                                                                                   
L:  53 | P:  48 | AL08 | Reuse of column alias 'total_using_both' from line 42.                                                                   
                       | [aliasing.unique.column]
L:  54 | P:  56 | AL08 | Reuse of column alias 'total_using_neither' from line
                       | 43. [aliasing.unique.column]
L:  71 | P:  59 | AL08 | Reuse of column alias 'pct_using_both' from line 60.
                       | [aliasing.unique.column]
L:  72 | P:  67 | AL08 | Reuse of column alias 'pct_using_neither' from line 61.
                       | [aliasing.unique.column]
== [../sql/2022/css/meta_unknown_properties.sql] FAIL                                                                                             
L:  51 | P:  45 | AL08 | Reuse of column alias 'total' from line 48.                                                                              
                       | [aliasing.unique.column]
== [../sql/2022/media/video_source_formats.sql] FAIL                                                                                              
L:  11 | P:   5 | AL08 | Reuse of column alias 'video_source_format_type' from                                                                    
                       | line 9. [aliasing.unique.column]
== [../sql/2022/performance/lcp_element_data.sql] FAIL                                                                                            
L: 112 | P:  15 | AL08 | Reuse of column alias 'total' from line 100.                                                                             
                       | [aliasing.unique.column]
== [../sql/2022/security/feature_adoption_by_country.sql] FAIL                                                                                    
L:  28 | P: 116 | AL08 | Reuse of column alias 'pct_csp' from line 27.                                                                            
                       | [aliasing.unique.column]
== [../sql/2022/seo/lighthouse_unused_css_js_by_rank.sql] FAIL                                                                                    
L:  31 | P:   5 | CV09 | Use of blocked regex                                                                                                     
                       | '`httparchive.lighthouse.2022_07_01_*`'.
                       | [convention.blocked_words]
== [../sql/2022/seo/pages-canonical-stats.sql] FAIL                                                                                               
L: 167 | P:   5 | CV09 | Use of blocked regex '`httparchive.pages.2022_07_01_*`'.                                                                 
                       | [convention.blocked_words]
== [../sql/2022/seo/robots-text-size.sql] FAIL                                                                                                    
L:  40 | P:   5 | CV09 | Use of blocked regex '`httparchive.pages.2022_07_01_*`'.                                                                 
                       | [convention.blocked_words]
All Finished 📜 🎉!      

Should we maybe just ignore past years?

max-ostapenko avatar May 19 '24 00:05 max-ostapenko

@rviscomi wasn't happy moving from this:

JOIN (
  SELECT
    _TABLE_SUFFIX AS client,
    COUNT(0) AS total
  FROM
    `httparchive.summary_pages.2022_06_01_*`
  GROUP BY
    _TABLE_SUFFIX)

to this style which v2 and above insists upon:

JOIN (
  SELECT
    _TABLE_SUFFIX AS client,
    COUNT(0) AS total
  FROM
    `httparchive.summary_pages.2022_06_01_*`
  GROUP BY
    _TABLE_SUFFIX
)

And I've had no luck so far getting the SQLFluff maintainers (of which I'm one btw, but don't do much with it these days) to support Rick's preferred bracketing, even optionally.

So that's a sticking point for the upgrade.

Other than that we'd need to update the noqa comments to avoid above flagging and should probably do more changes to the config (as noted in this branch)

tunetheweb avatar May 19 '24 15:05 tunetheweb

Thanks for the context. So the issue is formatting of subquery end bracket.

  1. I see the formatting you're talking about is raising error on layout.indent rule:

Expected line break and no indent before ')'.

This rule is critical for indentation formatting. But I don't see from it's description why is it taking care of newlines.

  1. layout.cte_bracket (another case of formatting the closing bracket) doesn't work as expected as well and has a conflict with layout.indent. See sqlfluff/sqlfluff/issues/5893. This conflict may impact the subqueries brackets too. @tunetheweb What do you think?

  2. Additionally we could introduce custom rule layout.subquery_bracket that would take care of this scenario. But again, the conflict with indentation needs to be resolved.

max-ostapenko avatar May 20 '24 12:05 max-ostapenko

Yeah the layout code is complex (which is why I haven't gone and fixed it myself!). I think we need to add line_position support to [sqlfluff:layout:type:end_bracket] config.

But to be honest I don't disagree with SQLFluff and think I actually prefer it on a new line (like we do with CTEs).

tunetheweb avatar May 20 '24 13:05 tunetheweb

@tunetheweb just to clarify, I actually prefer the closing parenthesis to be on its own line. What I said in https://github.com/HTTPArchive/almanac.httparchive.org/pull/3364#issuecomment-1474346015 was I wasn't (and still am not) a fan of changing from K&R:

FROM (
  SELECT
    _TABLE_SUFFIX AS client,
    hasHeading(payload) AS has_heading
  FROM
    `httparchive.pages.2019_07_01_*`
)

to GNU:

FROM
  (
    SELECT
      _TABLE_SUFFIX AS client,
      hasHeading(payload) AS has_heading
    FROM
      `httparchive.pages.2019_07_01_*`
  )

(Specifically, the extra indentation)

rviscomi avatar May 20 '24 15:05 rviscomi

OK then I think we might be OK then as think that was made optional. Let me try and run fix and see what it looks like.

tunetheweb avatar May 20 '24 15:05 tunetheweb

@rviscomi can you spot check a few files?

Will be good to get this upgraded as long been on my list to resolve this issue!

tunetheweb avatar May 20 '24 15:05 tunetheweb

I would re-enable LT09 - Select targets on new lines.

While I agree with this for top-level queries, it breaks subqueries like this:

https://github.com/HTTPArchive/almanac.httparchive.org/blob/64f8794c9f8bd24de2f46e6b0a721403786430a0/sql/2019/cms/14_15.sql#L24

And IIRC we have a lot of those subqueries and separating them all out would make the code very verbose.

LT08 - Blank line separators for CTEs.

This looks OK to enable. We used to have a lot of these:

WITH cte1 AS (
    SELECT a FROM table1
),
cte2 AS (
    SELECT a FROM table2
)

SELECT * FROM cte1

And we didn't want the space between the CTEs but looks like most of them have spaces now and I added when adding the linter! So looks like we can enable this one.

tunetheweb avatar May 20 '24 21:05 tunetheweb

Looked into LT09 again - what do you mean by 'breaks subqueries'?

I've found 4 files that will be definitely broken by this rule enabled, e.g. sql/2019/performance/07_03d.sql.

But in all other cases these inline subqueries are as if the query says 'no need to read this one, move on' Their FROMs are partly obscured (and surely WHEREs, if present) due to width.

max-ostapenko avatar May 22 '24 14:05 max-ostapenko

Yeah "breaks" is a bit strong. And it is a lot more readable with it enforced:

 FROM
   `httparchive.pages.2019_07_01_*`
 JOIN
-  (SELECT _TABLE_SUFFIX, COUNT(0) AS total FROM `httparchive.pages.2019_07_01_*` GROUP BY _TABLE_SUFFIX)
+  (SELECT
+    _TABLE_SUFFIX,
+    COUNT(0) AS total
+  FROM `httparchive.pages.2019_07_01_*` GROUP BY _TABLE_SUFFIX)
 USING (_TABLE_SUFFIX),
   UNNEST(getElements(payload)) AS element

So maybe it is OK to enable this rule?

For the few we don't want this enabled for we can ignore it with -- noqa: disable=LT09:

#standardSQL
# 07_03d: % fast FCP per PSI by geo
WITH geos AS ( -- noqa: disable=LT09
  SELECT *, 'af' AS geo_code, 'Afghanistan' AS geo, 'Asia' AS region, 'Southern Asia' AS subregion FROM `chrome-ux-report.country_af.201907`
  UNION ALL

tunetheweb avatar May 22 '24 14:05 tunetheweb

Actually looking over the changes with sqlfluff fix for this rule I'm not loving the changes to be honest. The fact it ONLY works on SELECT targets and not the other clauses only half fixes the issues and doesn't really improve readability IMHO.

Examples:

   FROM
     `httparchive.almanac.requests`
   JOIN (
-    SELECT _TABLE_SUFFIX AS client, url AS page, app
+    SELECT
+      _TABLE_SUFFIX AS client,
+      url AS page,
+      app
     FROM `httparchive.technologies.2022_06_01_*`
     WHERE category = 'Ecommerce'
   )
 #standardSQL
 # Top JS frameworks and libraries combinations
-SELECT
-  *
+SELECT *
 FROM (
   SELECT
     client,
 third_party_domains AS (
-  SELECT DISTINCT
-    NET.HOST(domain) AS host
+  SELECT DISTINCT NET.HOST(domain) AS host
   FROM
     `httparchive.almanac.third_parties`
 ),

In fact in there's very few cases where I do think it improves things. Maybe this one cause it only has a FROM clause (but that's very much the exception and most will have WHERE clauses or GROUP BY)?

 FROM
-  (SELECT _TABLE_SUFFIX AS client, url AS page, getCompliantElements(payload) AS compliant_elements FROM `httparchive.pages.2019_07_01_*`)
+  (SELECT
+    _TABLE_SUFFIX AS client,
+    url AS page,
+    getCompliantElements(payload) AS compliant_elements
+  FROM `httparchive.pages.2019_07_01_*`)
 JOIN

tunetheweb avatar May 22 '24 15:05 tunetheweb

Regarding indentation discussion, here's a config to get to the expected result:

[sqlfluff:layout:type:start_bracket]
spacing_before = single:inline

If seems OK, I'll push the changes and we'll have a fresh look.

And while I agree with @tunetheweb that inline opening bracket seems strange in expressions, I don't know a way to distinguish between expressions and subqueries. Moving opening bracket inline also decreases the overall amount of indents (GNU -> K&R).

So we'll get:

sql/2019/media/04_09c.sql b/sql/2019/media/04_09c.sql
-          `httparchive.almanac.summary_response_bodies`
-        WHERE
-          date = '2019-07-01' AND
-          firstHtml AND
-          (
-            REGEXP_CONTAINS(body, r'(?im)<meta[^><]*Accept-CH\b') OR
-            REGEXP_CONTAINS(respOtherHeaders, r'(?im)Accept-CH = ')
-          )
+      `httparchive.almanac.summary_response_bodies`
+    WHERE
+      date = '2019-07-01' AND
+      firstHtml AND (
+        REGEXP_CONTAINS(body, r'(?im)<meta[^><]*Accept-CH\b') OR
+        REGEXP_CONTAINS(respOtherHeaders, r'(?im)Accept-CH = ')
+      )

Some more examples (a lot of things go inline): 1.

2019/accessibility/09_02.sql b/sql/2019/accessibility/09_02.sql
-FROM
-  (SELECT _TABLE_SUFFIX AS client, url AS page, getMainCount(payload) AS main_elements FROM `httparchive.pages.2019_07_01_*`)
-JOIN
-  (SELECT client, page, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(body, '(?i)role=[\'"]?main')) AS main_roles FROM `httparchive.almanac.summary_response_bodies` WHERE date = '2019-07-01' AND firstHtml)
-USING
-  (client, page)
-JOIN
-  (SELECT _TABLE_SUFFIX AS client, COUNT(0) AS total FROM `httparchive.pages.2019_07_01_*` GROUP BY _TABLE_SUFFIX)
+FROM (SELECT _TABLE_SUFFIX AS client, url AS page, getMainCount(payload) AS main_elements FROM `httparchive.pages.2019_07_01_*`)
+JOIN (SELECT client, page, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(body, '(?i)role=[\'"]?main')) AS main_roles FROM `httparchive.almanac.summary_response_bodies` WHERE date = '2019-07-01' AND firstHtml)
+USING (client, page)
+JOIN (SELECT _TABLE_SUFFIX AS client, COUNT(0) AS total FROM `httparchive.pages.2019_07_01_*` GROUP BY _TABLE_SUFFIX)
2020/markup/pages_element_count_by_device_and_custom_dash_elements.sql
-CREATE TEMP FUNCTION AS_PERCENT(freq FLOAT64, total FLOAT64) RETURNS FLOAT64 AS (
-  ROUND(SAFE_DIVIDE(freq, total), 4)
+CREATE TEMP FUNCTION AS_PERCENT(freq FLOAT64, total FLOAT64) RETURNS FLOAT64 AS (ROUND(SAFE_DIVIDE(freq, total), 4)
2019/performance/07_22.sql
   SELECT
-    _TABLE_SUFFIX AS client,
-    (
+    _TABLE_SUFFIX AS client, (
       CAST(IFNULL(JSON_EXTRACT(payload, "$['_cpu.Paint']"), '0') AS INT64) +
       CAST(IFNULL(JSON_EXTRACT(payload, "$['_cpu.UpdateLayerTree']"), '0') AS INT64)
     ) AS paint_cpu_time
2020/markup/pages_markup_by_device.sql
-FROM
-  (
-    SELECT
-      _TABLE_SUFFIX AS client,
-      get_markup_info(JSON_EXTRACT_SCALAR(payload, '$._markup')) AS markup_info
-    FROM
-      `httparchive.pages.2020_08_01_*`
-  )
+FROM (
+  SELECT
+    _TABLE_SUFFIX AS client,
+    get_markup_info(JSON_EXTRACT_SCALAR(payload, '$._markup')) AS markup_info
+  FROM
+    `httparchive.pages.2020_08_01_*`
+)

max-ostapenko avatar May 22 '24 21:05 max-ostapenko

A newer version of sqlfluff exists, but since this PR has been edited by someone other than Dependabot I haven't updated it. You'll get a PR for the updated version as normal once this PR is merged.

dependabot[bot] avatar May 23 '24 12:05 dependabot[bot]

Pushed all the file updates corresponding to new indents. @tunetheweb @rviscomi thoughts?

max-ostapenko avatar Jun 04 '24 23:06 max-ostapenko