incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[Improvement][Aqe] Sort MapId before the data are flushed

Open jerqi opened this issue 3 years ago • 2 comments

When we use aqe, we need use mapId to filter the data which we don't need, If we sort MapId before the data are flushed. We split the data to segments, if a segment don't have the data which we want to read, we will drop the data. If data is sorted by mapId, we can filter more data and mprove our performance.

jerqi avatar Aug 06 '22 03:08 jerqi

Do we need to sort data by MapID of one partition before flushing data for all jobs? I think no. This will bring unused cost for those non-AQE optimized stages. Maybe we could sort the partition data by MapId when AQE's specified ShufflePartitionSpec is applied in first time.

zuston avatar Aug 28 '22 12:08 zuston

You are right.

jerqi avatar Aug 28 '22 14:08 jerqi

Do u have implemented this in your internal version? If not, I'm interested on this. @jerqi

zuston avatar Oct 09 '22 03:10 zuston

No. You can go ahead.

jerqi avatar Oct 09 '22 03:10 jerqi

I propose the design of this issue. https://docs.google.com/document/d/1G0cOFVJbYLf2oX1fiadh7zi2M6DlEcjTQTh4kSkb0LA/edit?usp=sharing

PTAL @jerqi

zuston avatar Oct 28 '22 06:10 zuston

It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.

jerqi avatar Oct 28 '22 07:10 jerqi

It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.

Does data need to sort by mapId?

zuston avatar Oct 28 '22 07:10 zuston

It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.

Does data need to sort by mapId?

Yes, we only need local order. If we have local order, we can filter much data effectively.

jerqi avatar Oct 28 '22 07:10 jerqi

It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.

Does data need to sort by mapId?

Yes, we only need local order. If we have local order, we can filter much data effectively.

Emm... I remember you prefer only sort the index-file instead of data-file, which is mentioned in offline meeting. Do i misunderstand you?

zuston avatar Oct 28 '22 07:10 zuston

It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages.

Does data need to sort by mapId?

Yes, we only need local order. If we have local order, we can filter much data effectively.

Emm... I remember you prefer only sort the index-file instead of data-file, which is mentioned in offline meeting. Do i misunderstand you?

Give an example: We have three buffers to flush, they taskId 1 block, taskId 2 block, taskId 3 block. We should sort them to taskId 1 block, taskId 2 block, taskId 3 block. And then we can flush them to disks.Then we receive taskId 2 block, taskId 6 block, taskId 1 block, we sort them and flush them, so currently the data on the disk should be taskId 1 block , taskId 2 block, taskId 3 block, taskId 1 block, taskId 2 block, taskId 6 block. The data only have local order.

jerqi avatar Oct 28 '22 07:10 jerqi

taskId-1 block , taskId-2 block, taskId-3 block, taskId-1 block, taskId-2 block, taskId-6 block.

If one reader want the data from taskId=1, so it still want to read the data segment from taskId-1 block , taskId-2 block, taskId-3 block, taskId-1 block. The data of taskId-2 block, taskId-3 block is unnecessary for this reader. Right?

zuston avatar Oct 28 '22 08:10 zuston

taskId-1 block , taskId-2 block, taskId-3 block, taskId-1 block, taskId-2 block, taskId-6 block.

If one reader want the data from taskId=1, so it still want to read the data segment from taskId-1 block , taskId-2 block, taskId-3 block, taskId-1 block. The data of taskId-2 block, taskId-3 block is unnecessary for this reader. Right?

Yes.

jerqi avatar Oct 28 '22 08:10 jerqi

This looks ineffective and it's the same with the original block filter.

zuston avatar Oct 28 '22 08:10 zuston

This looks ineffective and it's the same with the original block filter.

Actually considering random io, It will cost the same time when you read 3 records or 2 records.

jerqi avatar Oct 28 '22 08:10 jerqi

This looks ineffective and it's the same with the original block filter.

Actually considering random io, It will cost the same time when you read 3 records or 2 records.

Yes. According to the problems mentioned by proposal design motivation section, the key point is a lot of data read by multiple times which depends on split number optimized by AQE. From this view, we should sort the data file.

zuston avatar Oct 28 '22 08:10 zuston

This looks ineffective and it's the same with the original block filter.

Actually considering random io, It will cost the same time when you read 3 records or 2 records.

Yes. According to the problems mentioned by proposal design motivation section, the key point is a lot of data read by multiple times which depends on split number optimized by AQE. From this view, we should sort the data file.

We don't need global order, local order should be enough.

jerqi avatar Oct 28 '22 09:10 jerqi

#293

zuston avatar Nov 03 '22 11:11 zuston