incubator-uniffle [Improvement] Optimize local disk selection strategy

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Search before asking

[X] I have searched in the issues and found no similar issues.

What would you like to be improved?

I want to raise this issue to improve stability when using MEMORY_LOCALFILE storage type. Maybe some issues will be as sub-tasks in this improvement.

The first improvement is to avoid all apps fail when single disk capacity reaches high-watermark. We could do below optimizations.

Introduce the metrics of TOP10 apps which use the number of written bytes #333 .
Introduce the free space & total space metrics of every local disk
Introduce the pluggable disk selection strategy. Currently the disk will be selected based on the hash. Free-capacity based strategy should be supported.
Allow app write data to another disk when encountering the corresponding disk reaching high-watermark #306

How should we improve?

No response

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

Nov 29 '22 07:11 zuston

PTAL @jerqi @xianjingfeng @leixm @smallzhongfeng @kaijchen

Nov 29 '22 07:11 zuston

We choose hash selection strategy. Because we want to reduce the size of meta data which we need maintain in the memory.

Nov 29 '22 10:11 jerqi

Can we use Consistent Hashing?

Nov 30 '22 04:11 xianjingfeng

Introduce the pluggable disk selection strategy. Currently the disk will be selected based on the hash. Free-capacity based strategy should be supported.

Agreed. Currently the hash based strategy may cause unbalanced disk I/Os among different disks as app's shuffle patterns may vary dramatically. Capacity and disk-stats based strategy is very nice to have.

Dec 09 '22 08:12 advancedxy

Introduce the free space & total space metrics of every local disk

@zuston how do you plan to collect these metrics? By using df, or any other fancy ways?

Dec 13 '22 09:12 advancedxy

Interesting feature

Nov 16 '24 14:11 maobaolong