incubator-uniffle [Improvement] Merge data file and index file

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Search before asking

[X] I have searched in the issues and found no similar issues.

What would you like to be improved?

Now, we store data file and index file separately. It will increase IO cost. We can merge them. We can refer to the implement of Celeborn. It's sure that we need consider more things like AQE. This improvement will increase the performance of RSS.

How should we improve?

No response

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

May 20 '23 05:05 jerqi

We can merge them. We can refer to the implement of Celeborn. It's sure that we need consider more things like AQE.

I think you don't describe this optimization clearly, like data/index layout and how to read for client side. Just referring to celeborn is not enough.

May 21 '23 01:05 zuston

We can merge them. We can refer to the implement of Celeborn. It's sure that we need consider more things like AQE.

I think you don't describe this optimization clearly, like data/index layout and how to read for client side. Just referring to celeborn is not enough.

Just a proposal, not a design doc.

May 21 '23 03:05 jerqi

I have thought about this, but i have no good ideas. May be we should also consider object storage here. I am going to store indexes in memory as we discuss in #407.

May 22 '23 09:05 xianjingfeng

@xianjingfeng

Do you use object storage?
Why should we consider if we use object storage?
I see #407, you seems that you occur the problems when you use AQE. Do you try the local sort optimization of Uniffle?
We can merge index and data file refer to Celeborn. I can propose more details of Celeborn.

May 22 '23 11:05 jerqi

@xianjingfeng

Do you use object storage?

Why should we consider if we use object storage?

I see [FEATURE] Cache index files on the server side #407, you seems that you occur the problems when you use AQE. Do you try the local sort optimization of Uniffle?

We can merge index and data file refer to Celeborn. I can propose more details of Celeborn.

We are going to use object storage. We will put Uniffle on the public cloud to run and use object storage service of cloud vendors.
Because most of object storage systems do not support append-mode. We need to consider how to be compatible with it. #391
Yes, i have use local sort optimization , but it just for skipping data files.

May 22 '23 13:05 xianjingfeng

For object storage, I prefer adding list operation for shuffle server like listShuffleFiles. Because object storage's list operation usually have poor performance. It's better for object if we merge data file and index file. We can merge them in the support of object storage. I think it's better than caching index file.

May 22 '23 15:05 jerqi

Our index and data format is like We can merge them into one file

blockNum
blockId1 
offset
crc
compress length
uncompress length
taskId
blockId2
offset
crc
compress length
uncompress length
taskId
blockId3
offset
crc
compress length
uncompress length
taskId
......
block1
block2
block3
.....

May 22 '23 16:05 jerqi

Could you help describe more about integrating with object store using above design? @jerqi

Does the memory data will be flushed to object store using above layout? Or from the localfile to object store? It's not clear.

For object storage, I prefer adding list operation for shuffle server like listShuffleFiles.

Sounds good.

May 23 '23 02:05 zuston

For object storage, I prefer adding list operation for shuffle server like listShuffleFiles. Because object storage's list operation usually have poor performance. It's better for object if we merge data file and index file. We can merge them in the support of object storage. I think it's better than caching index file.

I think we'd better not to merge index and data into one file. Because we have to read these index file from the so many object file, which will be too slow when skipping some data.

From my prospective, the index files size is under the control, that means we could store them into local disk on the shuffle-server. The speration of index and data is the balance of speed and capacity.

Feb 08 '24 03:02 zuston