incubator-uniffle
incubator-uniffle copied to clipboard
Use NIO's Files API to replace FileInputStream/FileOutputStream in some paths
What changes were proposed in this pull request?
Use NIO's Files API to replace FileInputStream/FileOutputStream in some paths.
Why are the changes needed?
Follow this PR of spark: https://github.com/apache/spark/pull/20144 and https://github.com/apache/spark/pull/18684
Refer to reason of this change:
Java's FileInputStream and FileOutputStream overrides finalize(), even this file input/output stream is closed correctly and promptly, it will still leave some memory footprints which will only get cleaned in Full GC. This will introduce two side effects: Lots of memory footprints regarding to Finalizer will be kept in memory and this will increase the memory overhead. In our use case of external shuffle service, a busy shuffle service will have bunch of this object and potentially lead to OOM. The Finalizer will only be called in Full GC, and this will increase the overhead of Full GC and lead to long GC pause. https://bugs.openjdk.java.net/browse/JDK-8080225 https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful So to fix this potential issue, here propose to use NIO's Files#newInput/OutputStream instead in some critical paths like shuffle.
Does this PR introduce any user-facing change?
No
How was this patch tested?
No need.
Codecov Report
Merging #65 (e82169b) into master (bdffcaa) will increase coverage by
1.22%. The diff coverage is100.00%.
@@ Coverage Diff @@
## master #65 +/- ##
============================================
+ Coverage 55.15% 56.38% +1.22%
- Complexity 1112 1172 +60
============================================
Files 149 149
Lines 7969 7969
Branches 761 761
============================================
+ Hits 4395 4493 +98
+ Misses 3332 3233 -99
- Partials 242 243 +1
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...e/uniffle/storage/handler/impl/HdfsFileWriter.java | 55.22% <ø> (ø) |
|
| .../java/org/apache/uniffle/common/util/RssUtils.java | 65.71% <100.00%> (+0.24%) |
:arrow_up: |
| ...org/apache/uniffle/server/LocalStorageChecker.java | 69.56% <100.00%> (ø) |
|
| .../uniffle/storage/handler/impl/LocalFileWriter.java | 90.00% <100.00%> (ø) |
|
| ...ache/uniffle/storage/util/ShuffleStorageUtils.java | 61.00% <100.00%> (ø) |
|
| ...org/apache/uniffle/server/ShuffleFlushManager.java | 76.70% <0.00%> (-1.71%) |
:arrow_down: |
| ...he/uniffle/client/impl/ShuffleWriteClientImpl.java | 26.05% <0.00%> (+0.09%) |
:arrow_up: |
| .../java/org/apache/uniffle/common/BufferSegment.java | 100.00% <0.00%> (+3.84%) |
:arrow_up: |
| .../org/apache/uniffle/common/ShuffleIndexResult.java | 100.00% <0.00%> (+14.28%) |
:arrow_up: |
| ...apache/uniffle/common/ShufflePartitionedBlock.java | 70.58% <0.00%> (+35.29%) |
:arrow_up: |
| ... and 6 more |
Could you add some performance results? Should we only modify the critical code?
Could you add some performance results? Should we only modify the critical code?
Actually i didn't do any performance test. Just found this optimization while I was browsing spark codebase. Maybe we should ping @jerryshao(the author of spark-21745)
…me paths
Modify the title and description.
@zuston From https://github.com/apache/spark/pull/20119, there has performance issue with NIO's Files API. I think read data from shuffle server can be improved with solution as zero-copy?
@zuston From apache/spark#20119, there has performance issue with NIO's Files API.
The performance issue is caused by the default InputStream.skip implementation. So i dont replace it in the uniffle LocalFileReader class.
I think read data from shuffle server can be improved with solution as zero-copy?
After browsing spark related codebase, i think maybe using NIO api of filechannel will improve the performance. But just guess, i have no practice on this and need more test.
Any progress? What do i need to do before merging @jerqi
Any progress? Do i need to do what before merging @jerqi
I think we need some performance tests to prove the effectiveness of change.
I think we could directly follow the Spark change. Besides, the performance test looks hard. @jerqi
I think we could directly follow the Spark change. Besides, the performance test looks hard. @jerqi
I prefer not merging this pr without test. Maybe we can solve this problem util we encounter the similar, at that time, we have the situation to prove the effect of pr.