sqlfluff icon indicating copy to clipboard operation
sqlfluff copied to clipboard

LT05 (layout.long_lines) results in invalid BigQuery code

Open tuzonghua opened this issue 1 year ago • 3 comments

Search before asking

  • [X] I searched the issues and found no similar issues.

What Happened

Running sqlfluff fix on a sample BigQuery query results in invalid SQL.

Expected Behaviour

sqlfluff fix correctly formats the file.

Observed Behaviour

Using test query and config from below:

$ sqlfluff fix test_query.sql
==== finding fixable violations ====
== [test_query.sql] FAIL
L:   4 | P:   1 | LT05 | Line is too long (161 > 120).
                       | [layout.long_lines]
== [test_query.sql] FIXED
1 fixable linting violations found
$

Output:

SELECT
  some_col_a,
  some_col_b
FROM {{ params.some_project }}{{ params.some_dataset }}{{ params.some_table_id }}`

For this sample query, I wouldn't expect a LT05 violation as both the non-templated and templated line 4 would be < 120.

How to reproduce

Test SQL:

SELECT
  some_col_a,
  some_col_b
FROM `{{ params.some_project }}.{{ params.some_dataset }}.{{ params.some_table_id }}`

Using sample config below, run sqlfluff fix test_query.sql.

Dialect

BigQuery

Version

$ sqlfluff --version
sqlfluff, version 3.0.5
$ python -V
Python 3.8.19

Configuration

[sqlfluff]
dialect = bigquery
templater = jinja
exclude_rules = references.keywords, aliasing.forbid, structure.using, structure.column_order, structure.simple_case, capitalisation.identifiers
large_file_skip_byte_limit=0
max_line_length = 120

[sqlfluff:indentation]
indent_unit = space
tab_space_size = 2

[sqlfluff:rules:capitalisation.keywords]
capitalization_policy = upper

[sqlfluff:rules:capitalisation.functions]
extended_capitalisation_policy = upper

[sqlfluff:rules:convention.quoted_literals]
preferred_quoted_literal_style = single_quotes

[sqlfluff:templater:jinja:context]
params = { "some_project": "my_project", "some_dataset": "some_dataset", "some_table_id": "table_id"}

Are you willing to work on and submit a PR to address the issue?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

tuzonghua avatar Apr 19 '24 20:04 tuzonghua

Thanks @tuzonghua - can you confirm whether this issue was present on 3.0.4 - or is it a new issue starting only in 3.0.5?

alanmcruickshank avatar Apr 20 '24 11:04 alanmcruickshank

@alanmcruickshank I just tested on 3.0.4 and came up with the same result. Here's a debug log from that attempt: test_log.log.

tuzonghua avatar Apr 20 '24 12:04 tuzonghua

I got a bit confused for a bit, but I think I've found the problem. The example you've posted doesn't case the bug, but the log you attached shows why. In the log version, your params is actually:

[sqlfluff:templater:jinja:context]
params = { "some_project": "my_project", "some_dataset": "some_dataset", "some_table_id": "table_id", "adjusted_views_ix": 01, "all_keyword": "All", "all_types_subscription_type_format_name_literal": "All Types", "all_types_subscription_type_literal": "all_types", "card_configs_dataset": "card_configs", "card_source": "card", "change_in_spend_config_table_id": "change_in_spend", "corporate_client_base_mappings_table_id": "corporate_client_base_mappings", "corporate_client_merchant_lookup_table_id": "corporate_client_merchant_lookup", "core_dataset": "core", "corporate_dataset": "corporate", "cross_shop_config_table_id": "cross_shop", "data_project": "test-data-project", "dec_2021_affected_users_table_id": "dec_2021_affected_users", "default_sum_info_id": "111", "earliest_transaction_optimized_date": "2017-01-01", "external_bank_table_id": "bank",  "external_card_table_id": "card", "extra_5244_members_table_id": "extra_5244_members", "file_name_column": "FILE_NAME", "corporate_food_index_lookup_table_id": "corporate_food_index_lookup", "hash_mappings_table_id": "hash_mappings", "included_cobrand_ids": "4146,5242", "income_base_table_id": "income_base", "income_dataset": "income", "income_panel_start_year": 2019, "ingestion_dataset": "ingestion", "ingestion_project": "ingestion", "joined_member_panels_table_id": "joined_member_panels", "member_home_geo_table_id": "member_home_geo", "merchant_income_bracket_date": "2017-01-01", "merchant_transaction_panel_v2_table_id": "merchant_transaction_panel_v2", "optimized_date_year": "2017", "panels_dataset": "panels", "period_months": "3", "product_base_table_id": "product_base_table_id", "product_geo_limits_table_id": "product_geo_limits", "query_dataset": "query", "region_state_lookup_table_id": "region_state_lookup", "snapshots_dataset": "test_snapshots_dataset", "spend_279_transactions_table_id": "spend_279_transactions", "standard_cobrand": "5420", "static_dataset": "static", "subscription_base_input_table_id": "test_subscription_base_input", "subscription_geo_table_id": "test_subs_geo", "subscription_view_ix": 32, "subscription_views_dataset": "test_dataset", "subscription_views_input_table_id": "test_subs_views_input", "subscription_views_netflix_price_points_table_id": "test_netflix_price_points_table", "subscription_views_netflix_ungrandfathering_periods_table_id": "test_netflix_ungrandfathering_periods", "subscription_views_output_table_id": "test_subs_views_output", "subscription_views_project": "test-project", "sum_info_id_mapping_table_id": "sum_info_id_mapping", "tax_city_table_id": "tax_city", "tax_income_lookup_table_id": "tax_income_lookup", "transactions_table_id": "transactions", "wm1_fpws_table_id": "test_fpws", "wm1_omnichannel_table_id": "test_omnichannel", "wm1_orders_per_table_id": "test_orders_per", "wm1_sow_table_id": "test_sow", "panel": "complete"}

Importantly this can be reduced to:

[sqlfluff:templater:jinja:context]
params = { "some_project": "my_project", "some_dataset": "some_dataset", "some_table_id": "table_id", "adjusted_views_ix": 01}

Note that the value for adjusted_views_ix is an unquoted 01. If I remove this value - the bug goes away.

I think what's happening here is that your params aren't being parsed properly (albeit silently) and that means the templating is failing - and it's finding a longer string than it should.

Two problems to solve:

  1. Can we add better warnings when things aren't being parsed properly?
  2. How do we make sure the bug doesn't happen when parameters are incorrectly set?

In the short term though - I think you can make this go away by correcting the parameters JSON.

alanmcruickshank avatar Apr 20 '24 19:04 alanmcruickshank

@alanmcruickshank Thanks for the help! I'm going to close this for now as the original issue has been solved. I will add that I'm still getting unexpected behavior when the params dictionary is missing a key, e.g., extra \n added or sqlfluff removing a . from a table name ({{ params.some_project }}.{{ params.some_dataset }}.{{ params.some_table_id }} -> {{ params.some_project }}.{{ params.some_dataset }}{{ params.some_table_id }}) but it looks like sqlfluff now returns an error like WARNING Skipping edit patch on uncertain templated section [('mid_point', slice(503, 530, None))], Please report this warning on GitHub along with the query that produced it. which isn't very clear but at least it lets me know there's an issue with the template.

tuzonghua avatar Jul 02 '24 20:07 tuzonghua