LT05 (layout.long_lines) results in invalid BigQuery code
Search before asking
- [X] I searched the issues and found no similar issues.
What Happened
Running sqlfluff fix on a sample BigQuery query results in invalid SQL.
Expected Behaviour
sqlfluff fix correctly formats the file.
Observed Behaviour
Using test query and config from below:
$ sqlfluff fix test_query.sql
==== finding fixable violations ====
== [test_query.sql] FAIL
L: 4 | P: 1 | LT05 | Line is too long (161 > 120).
| [layout.long_lines]
== [test_query.sql] FIXED
1 fixable linting violations found
$
Output:
SELECT
some_col_a,
some_col_b
FROM {{ params.some_project }}{{ params.some_dataset }}{{ params.some_table_id }}`
For this sample query, I wouldn't expect a LT05 violation as both the non-templated and templated line 4 would be < 120.
How to reproduce
Test SQL:
SELECT
some_col_a,
some_col_b
FROM `{{ params.some_project }}.{{ params.some_dataset }}.{{ params.some_table_id }}`
Using sample config below, run sqlfluff fix test_query.sql.
Dialect
BigQuery
Version
$ sqlfluff --version
sqlfluff, version 3.0.5
$ python -V
Python 3.8.19
Configuration
[sqlfluff]
dialect = bigquery
templater = jinja
exclude_rules = references.keywords, aliasing.forbid, structure.using, structure.column_order, structure.simple_case, capitalisation.identifiers
large_file_skip_byte_limit=0
max_line_length = 120
[sqlfluff:indentation]
indent_unit = space
tab_space_size = 2
[sqlfluff:rules:capitalisation.keywords]
capitalization_policy = upper
[sqlfluff:rules:capitalisation.functions]
extended_capitalisation_policy = upper
[sqlfluff:rules:convention.quoted_literals]
preferred_quoted_literal_style = single_quotes
[sqlfluff:templater:jinja:context]
params = { "some_project": "my_project", "some_dataset": "some_dataset", "some_table_id": "table_id"}
Are you willing to work on and submit a PR to address the issue?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks @tuzonghua - can you confirm whether this issue was present on 3.0.4 - or is it a new issue starting only in 3.0.5?
@alanmcruickshank I just tested on 3.0.4 and came up with the same result. Here's a debug log from that attempt: test_log.log.
I got a bit confused for a bit, but I think I've found the problem. The example you've posted doesn't case the bug, but the log you attached shows why. In the log version, your params is actually:
[sqlfluff:templater:jinja:context]
params = { "some_project": "my_project", "some_dataset": "some_dataset", "some_table_id": "table_id", "adjusted_views_ix": 01, "all_keyword": "All", "all_types_subscription_type_format_name_literal": "All Types", "all_types_subscription_type_literal": "all_types", "card_configs_dataset": "card_configs", "card_source": "card", "change_in_spend_config_table_id": "change_in_spend", "corporate_client_base_mappings_table_id": "corporate_client_base_mappings", "corporate_client_merchant_lookup_table_id": "corporate_client_merchant_lookup", "core_dataset": "core", "corporate_dataset": "corporate", "cross_shop_config_table_id": "cross_shop", "data_project": "test-data-project", "dec_2021_affected_users_table_id": "dec_2021_affected_users", "default_sum_info_id": "111", "earliest_transaction_optimized_date": "2017-01-01", "external_bank_table_id": "bank", "external_card_table_id": "card", "extra_5244_members_table_id": "extra_5244_members", "file_name_column": "FILE_NAME", "corporate_food_index_lookup_table_id": "corporate_food_index_lookup", "hash_mappings_table_id": "hash_mappings", "included_cobrand_ids": "4146,5242", "income_base_table_id": "income_base", "income_dataset": "income", "income_panel_start_year": 2019, "ingestion_dataset": "ingestion", "ingestion_project": "ingestion", "joined_member_panels_table_id": "joined_member_panels", "member_home_geo_table_id": "member_home_geo", "merchant_income_bracket_date": "2017-01-01", "merchant_transaction_panel_v2_table_id": "merchant_transaction_panel_v2", "optimized_date_year": "2017", "panels_dataset": "panels", "period_months": "3", "product_base_table_id": "product_base_table_id", "product_geo_limits_table_id": "product_geo_limits", "query_dataset": "query", "region_state_lookup_table_id": "region_state_lookup", "snapshots_dataset": "test_snapshots_dataset", "spend_279_transactions_table_id": "spend_279_transactions", "standard_cobrand": "5420", "static_dataset": "static", "subscription_base_input_table_id": "test_subscription_base_input", "subscription_geo_table_id": "test_subs_geo", "subscription_view_ix": 32, "subscription_views_dataset": "test_dataset", "subscription_views_input_table_id": "test_subs_views_input", "subscription_views_netflix_price_points_table_id": "test_netflix_price_points_table", "subscription_views_netflix_ungrandfathering_periods_table_id": "test_netflix_ungrandfathering_periods", "subscription_views_output_table_id": "test_subs_views_output", "subscription_views_project": "test-project", "sum_info_id_mapping_table_id": "sum_info_id_mapping", "tax_city_table_id": "tax_city", "tax_income_lookup_table_id": "tax_income_lookup", "transactions_table_id": "transactions", "wm1_fpws_table_id": "test_fpws", "wm1_omnichannel_table_id": "test_omnichannel", "wm1_orders_per_table_id": "test_orders_per", "wm1_sow_table_id": "test_sow", "panel": "complete"}
Importantly this can be reduced to:
[sqlfluff:templater:jinja:context]
params = { "some_project": "my_project", "some_dataset": "some_dataset", "some_table_id": "table_id", "adjusted_views_ix": 01}
Note that the value for adjusted_views_ix is an unquoted 01. If I remove this value - the bug goes away.
I think what's happening here is that your params aren't being parsed properly (albeit silently) and that means the templating is failing - and it's finding a longer string than it should.
Two problems to solve:
- Can we add better warnings when things aren't being parsed properly?
- How do we make sure the bug doesn't happen when parameters are incorrectly set?
In the short term though - I think you can make this go away by correcting the parameters JSON.
@alanmcruickshank Thanks for the help! I'm going to close this for now as the original issue has been solved. I will add that I'm still getting unexpected behavior when the params dictionary is missing a key, e.g., extra \n added or sqlfluff removing a . from a table name ({{ params.some_project }}.{{ params.some_dataset }}.{{ params.some_table_id }} -> {{ params.some_project }}.{{ params.some_dataset }}{{ params.some_table_id }}) but it looks like sqlfluff now returns an error like WARNING Skipping edit patch on uncertain templated section [('mid_point', slice(503, 530, None))], Please report this warning on GitHub along with the query that produced it. which isn't very clear but at least it lets me know there's an issue with the template.