byzer-lang
byzer-lang copied to clipboard
Two tables join, and the columns will be misaligned.
Here is my code:
select
owner,
owner_email,
owner_mgr,
owner_mgr_email,
week_begin,
actual_hour,
working_days*8 as working_hour
from(
select
owner,
owner_email,
owner_mgr,
owner_mgr_email,
week_begin,
sum(ts_hour) as actual_hour
from workload_union
where owner_active=1
group by owner, owner_email, owner_mgr, owner_mgr_email, week_begin
) ht left join week_calendar wc on ht.week_begin = wc.week
as hour_table;
select
tpe as tpe,
wl.description,
item as item,
ky_project_id as ky_project_id,
ky_project_name as ky_project_name,
wl.ky_customer_id as ky_customer_id,
occur_date as occur_date,
...
outlier as outlier,
from
workload_union wl left join hour_table ht on wl.week_begin = ht.week_begin and wl.owner_email = ht.owner_email
as workload3;
I expec description
to appear in the second column.But It always appear at the end of columns.
It may occured becaused spark RDD sorting.
Look at this: https://stackoverflow.com/questions/52434075/scala-spark-order-changes-when-writing-a-dataframe-to-a-csv-file
And you can fix this by use ET TableRepartition. HERE is the doc: https://docs.byzer.org/#/byzer-lang/zh-cn/extension/et/TableRepartition?id=%e8%a1%a8%e5%88%86%e5%8c%ba%e6%8f%92%e4%bb%b6-tablerepartition