incubator-uniffle [Improvement][Aqe] Sort MapId before the data are flushed

When we use aqe, we need use mapId to filter the data which we don't need, If we sort MapId before the data are flushed. We split the data to segments, if a segment don't have the data which we want to read, we will drop the data. If data is sorted by mapId, we can filter more data and mprove our performance.

Aug 06 '22 03:08 jerqi

Do we need to sort data by MapID of one partition before flushing data for all jobs? I think no. This will bring unused cost for those non-AQE optimized stages. Maybe we could sort the partition data by MapId when AQE's specified ShufflePartitionSpec is applied in first time.

Aug 28 '22 12:08 zuston

You are right.

Aug 28 '22 14:08 jerqi

Do u have implemented this in your internal version? If not, I'm interested on this. @jerqi

Oct 09 '22 03:10 zuston

No. You can go ahead.

Oct 09 '22 03:10 jerqi

I propose the design of this issue. https://docs.google.com/document/d/1G0cOFVJbYLf2oX1fiadh7zi2M6DlEcjTQTh4kSkb0LA/edit?usp=sharing

PTAL @jerqi

Oct 28 '22 06:10 zuston

It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.

Oct 28 '22 07:10 jerqi

It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.

Does data need to sort by mapId?

Oct 28 '22 07:10 zuston

It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.

Does data need to sort by mapId?

Yes, we only need local order. If we have local order, we can filter much data effectively.

Oct 28 '22 07:10 jerqi

It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.

Does data need to sort by mapId?

Yes, we only need local order. If we have local order, we can filter much data effectively.

Emm... I remember you prefer only sort the index-file instead of data-file, which is mentioned in offline meeting. Do i misunderstand you?

Oct 28 '22 07:10 zuston

It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.

Does data need to sort by mapId?

Yes, we only need local order. If we have local order, we can filter much data effectively.

Emm... I remember you prefer only sort the index-file instead of data-file, which is mentioned in offline meeting. Do i misunderstand you?

Give an example: We have three buffers to flush, they taskId 1 block, taskId 2 block, taskId 3 block. We should sort them to taskId 1 block, taskId 2 block, taskId 3 block. And then we can flush them to disks.Then we receive taskId 2 block, taskId 6 block, taskId 1 block, we sort them and flush them, so currently the data on the disk should be taskId 1 block , taskId 2 block, taskId 3 block, taskId 1 block, taskId 2 block, taskId 6 block. The data only have local order.

Oct 28 '22 07:10 jerqi

taskId-1 block , taskId-2 block, taskId-3 block, taskId-1 block, taskId-2 block, taskId-6 block.

If one reader want the data from taskId=1, so it still want to read the data segment from taskId-1 block , taskId-2 block, taskId-3 block, taskId-1 block. The data of taskId-2 block, taskId-3 block is unnecessary for this reader. Right?

Oct 28 '22 08:10 zuston

taskId-1 block , taskId-2 block, taskId-3 block, taskId-1 block, taskId-2 block, taskId-6 block.

If one reader want the data from taskId=1, so it still want to read the data segment from taskId-1 block , taskId-2 block, taskId-3 block, taskId-1 block. The data of taskId-2 block, taskId-3 block is unnecessary for this reader. Right?

Yes.

Oct 28 '22 08:10 jerqi

This looks ineffective and it's the same with the original block filter.

Oct 28 '22 08:10 zuston

This looks ineffective and it's the same with the original block filter.

Actually considering random io, It will cost the same time when you read 3 records or 2 records.

Oct 28 '22 08:10 jerqi

This looks ineffective and it's the same with the original block filter.

Actually considering random io, It will cost the same time when you read 3 records or 2 records.

Yes. According to the problems mentioned by proposal design motivation section, the key point is a lot of data read by multiple times which depends on split number optimized by AQE. From this view, we should sort the data file.

Oct 28 '22 08:10 zuston

This looks ineffective and it's the same with the original block filter.

Actually considering random io, It will cost the same time when you read 3 records or 2 records.

Yes. According to the problems mentioned by proposal design motivation section, the key point is a lot of data read by multiple times which depends on split number optimized by AQE. From this view, we should sort the data file.

We don't need global order, local order should be enough.

Oct 28 '22 09:10 jerqi

#293

Nov 03 '22 11:11 zuston

incubator-uniffle incubator-uniffle copied to clipboard

[Improvement][Aqe] Sort MapId before the data are flushed

incubator-uniffle
incubator-uniffle copied to clipboard