dbt-ml-preprocessing
dbt-ml-preprocessing copied to clipboard
partial whitespace fix
This PR is just for code consideration. I have been using sqlfluff as a linter and am finding sometimes sqlfluff will lint the compiled code instead of the source code. The one_hot_encoder
compiled code is littered with whitespaces that might throw errors if sqlfluff were to lint it. Such as the trailing whitespaces after case
and then 1
in the macro I've partially edited. These whitespace issues as well as the amount of useless new lines created. I wanted to get the owners' thoughts on this cleanup and if this type of cleanup is worth the effort.
Example CTE before edits:
, ohe AS (
select
DateOfExport,
PlatformKey,
case
when HC_CareerTrack = 'Client Market' then 1
when HC_CareerTrack in ('Client Market','Client Delivery Operations','Corporate Function','Sales') then 0
else cast('Error: unknown value found and handle_unknown parameter was "error"' as bit)
end as is_HC_CareerTrack_Client__Market
,
case
when HC_CareerTrack = 'Client Delivery Operations' then 1
when HC_CareerTrack in ('Client Market','Client Delivery Operations','Corporate Function','Sales') then 0
else cast('Error: unknown value found and handle_unknown parameter was "error"' as bit)
end as is_HC_CareerTrack_Client_Delivery__Operations
,
case
when HC_CareerTrack = 'Corporate Function' then 1
when HC_CareerTrack in ('Client Market','Client Delivery Operations','Corporate Function','Sales') then 0
else cast('Error: unknown value found and handle_unknown parameter was "error"' as bit)
end as is_HC_CareerTrack_Corporate_Function
,
case
when HC_CareerTrack = 'Sales' then 1
when HC_CareerTrack in ('Client Market','Client Delivery Operations','Corporate Function','Sales') then 0
else cast('Error: unknown value found and handle_unknown parameter was "error"' as bit)
end as is_HC_CareerTrack_Sales
from "AEDW"."MLInsights"."jes_stg_headcount"
)
Example CTE after edits:
, ohe AS (
select
DateOfExport,
PlatformKey,
case
when HC_CareerTrack = 'Client Market' then 1
when HC_CareerTrack in ('Client Market','Client Delivery Operations','Corporate Function','Sales') then 0
else cast('Error: unknown value found and handle_unknown parameter was "error"' as bit)
end as is_HC_CareerTrack_Client__Market,
case
when HC_CareerTrack = 'Client Delivery Operations' then 1
when HC_CareerTrack in ('Client Market','Client Delivery Operations','Corporate Function','Sales') then 0
else cast('Error: unknown value found and handle_unknown parameter was "error"' as bit)
end as is_HC_CareerTrack_Client_Delivery__Operations,
case
when HC_CareerTrack = 'Corporate Function' then 1
when HC_CareerTrack in ('Client Market','Client Delivery Operations','Corporate Function','Sales') then 0
else cast('Error: unknown value found and handle_unknown parameter was "error"' as bit)
end as is_HC_CareerTrack_Corporate_Function,
case
when HC_CareerTrack = 'Sales' then 1
when HC_CareerTrack in ('Client Market','Client Delivery Operations','Corporate Function','Sales') then 0
else cast('Error: unknown value found and handle_unknown parameter was "error"' as bit)
end as is_HC_CareerTrack_Sales
from "AEDW"."MLInsights"."jes_stg_headcount"
)
Thanks for the contribution @swanjson ! I think this is absolutely worth the effort, for sqlfluff and also for neater queries to see in the DBMS query history.
does that second example after edits pass sqlfluff in its current form?
@jamesweakley
I'm not entirely sure the way that sqlfluff operates on the backend, so I apologize for only telling you the bits that I know. Using sqlfluff with the dbt-templater, as I understand, runs the linting process on the code compiled by dbt, which means it should be linting the above code; however, I'm finding sqlfluff is passing in cases above. My sqlfluff configs have project rules for leading commas and no trailing whitespaces, which should make both of these cases fail. Sqlfluff must have some sort of ignore for those rules when linting on the dbt-templater compiled code. I'm not sure this change is even necessary for functionality without knowing how and upon which code sqlfluff is linting.
I believe the errors I was getting were not at one_hot_encoder's fault, but somewhere else in my codebase. Both of the above versions are working for me now.
Even still the proposed changes might be beneficial for compiled code readability.
I only looked at the two functions creating the CASE/WHEN statements, but the whitespace above the SELECT keyword seems to be generated by functions elsewhere in the scripts, if it's of interest to change that.