incubator-uniffle
incubator-uniffle copied to clipboard
[Improvement][Aqe] Sort MapId before the data are flushed
When we use aqe, we need use mapId to filter the data which we don't need, If we sort MapId before the data are flushed. We split the data to segments, if a segment don't have the data which we want to read, we will drop the data. If data is sorted by mapId, we can filter more data and mprove our performance.
Do we need to sort data by MapID of one partition before flushing data for all jobs? I think no. This will bring unused cost for those non-AQE optimized stages. Maybe we could sort the partition data by MapId when AQE's specified ShufflePartitionSpec is applied in first time.
You are right.
Do u have implemented this in your internal version? If not, I'm interested on this. @jerqi
No. You can go ahead.
I propose the design of this issue. https://docs.google.com/document/d/1G0cOFVJbYLf2oX1fiadh7zi2M6DlEcjTQTh4kSkb0LA/edit?usp=sharing
PTAL @jerqi
It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.
It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.
Does data need to sort by mapId?
It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.
Does data need to sort by mapId?
Yes, we only need local order. If we have local order, we can filter much data effectively.
It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.
Does data need to sort by mapId?
Yes, we only need local order. If we have local order, we can filter much data effectively.
Emm... I remember you prefer only sort the index-file instead of data-file, which is mentioned in offline meeting. Do i misunderstand you?
It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.
Does data need to sort by mapId?
Yes, we only need local order. If we have local order, we can filter much data effectively.
Emm... I remember you prefer only sort the index-file instead of data-file, which is mentioned in offline meeting. Do i misunderstand you?
Give an example: We have three buffers to flush, they taskId 1 block, taskId 2 block, taskId 3 block. We should sort them to taskId 1 block, taskId 2 block, taskId 3 block. And then we can flush them to disks.Then we receive taskId 2 block, taskId 6 block, taskId 1 block, we sort them and flush them, so currently the data on the disk should be taskId 1 block , taskId 2 block, taskId 3 block, taskId 1 block, taskId 2 block, taskId 6 block. The data only have local order.
taskId-1 block , taskId-2 block, taskId-3 block, taskId-1 block, taskId-2 block, taskId-6 block.
If one reader want the data from taskId=1, so it still want to read the data segment from taskId-1 block , taskId-2 block, taskId-3 block, taskId-1 block. The data of taskId-2 block, taskId-3 block is unnecessary for this reader. Right?
taskId-1 block , taskId-2 block, taskId-3 block, taskId-1 block, taskId-2 block, taskId-6 block.
If one reader want the data from taskId=1, so it still want to read the data segment from
taskId-1 block , taskId-2 block, taskId-3 block, taskId-1 block. The data oftaskId-2 block, taskId-3 blockis unnecessary for this reader. Right?
Yes.
This looks ineffective and it's the same with the original block filter.
This looks ineffective and it's the same with the original block filter.
Actually considering random io, It will cost the same time when you read 3 records or 2 records.
This looks ineffective and it's the same with the original block filter.
Actually considering random io, It will cost the same time when you read 3 records or 2 records.
Yes. According to the problems mentioned by proposal design motivation section, the key point is a lot of data read by multiple times which depends on split number optimized by AQE. From this view, we should sort the data file.
This looks ineffective and it's the same with the original block filter.
Actually considering random io, It will cost the same time when you read 3 records or 2 records.
Yes. According to the problems mentioned by proposal design motivation section, the key point is a lot of data read by multiple times which depends on split number optimized by AQE. From this view, we should sort the data file.
We don't need global order, local order should be enough.
#293