incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

Use NIO's Files API to replace FileInputStream/FileOutputStream in some paths

Open zuston opened this issue 3 years ago • 10 comments

What changes were proposed in this pull request?

Use NIO's Files API to replace FileInputStream/FileOutputStream in some paths.

Why are the changes needed?

Follow this PR of spark: https://github.com/apache/spark/pull/20144 and https://github.com/apache/spark/pull/18684

Refer to reason of this change:

Java's FileInputStream and FileOutputStream overrides finalize(), even this file input/output stream is closed correctly and promptly, it will still leave some memory footprints which will only get cleaned in Full GC. This will introduce two side effects: Lots of memory footprints regarding to Finalizer will be kept in memory and this will increase the memory overhead. In our use case of external shuffle service, a busy shuffle service will have bunch of this object and potentially lead to OOM. The Finalizer will only be called in Full GC, and this will increase the overhead of Full GC and lead to long GC pause. https://bugs.openjdk.java.net/browse/JDK-8080225 https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful So to fix this potential issue, here propose to use NIO's Files#newInput/OutputStream instead in some critical paths like shuffle.

Does this PR introduce any user-facing change?

No

How was this patch tested?

No need.

zuston avatar Jul 21 '22 10:07 zuston

Codecov Report

Merging #65 (e82169b) into master (bdffcaa) will increase coverage by 1.22%. The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master      #65      +/-   ##
============================================
+ Coverage     55.15%   56.38%   +1.22%     
- Complexity     1112     1172      +60     
============================================
  Files           149      149              
  Lines          7969     7969              
  Branches        761      761              
============================================
+ Hits           4395     4493      +98     
+ Misses         3332     3233      -99     
- Partials        242      243       +1     
Impacted Files Coverage Δ
...e/uniffle/storage/handler/impl/HdfsFileWriter.java 55.22% <ø> (ø)
.../java/org/apache/uniffle/common/util/RssUtils.java 65.71% <100.00%> (+0.24%) :arrow_up:
...org/apache/uniffle/server/LocalStorageChecker.java 69.56% <100.00%> (ø)
.../uniffle/storage/handler/impl/LocalFileWriter.java 90.00% <100.00%> (ø)
...ache/uniffle/storage/util/ShuffleStorageUtils.java 61.00% <100.00%> (ø)
...org/apache/uniffle/server/ShuffleFlushManager.java 76.70% <0.00%> (-1.71%) :arrow_down:
...he/uniffle/client/impl/ShuffleWriteClientImpl.java 26.05% <0.00%> (+0.09%) :arrow_up:
.../java/org/apache/uniffle/common/BufferSegment.java 100.00% <0.00%> (+3.84%) :arrow_up:
.../org/apache/uniffle/common/ShuffleIndexResult.java 100.00% <0.00%> (+14.28%) :arrow_up:
...apache/uniffle/common/ShufflePartitionedBlock.java 70.58% <0.00%> (+35.29%) :arrow_up:
... and 6 more

codecov-commenter avatar Jul 21 '22 10:07 codecov-commenter

Could you add some performance results? Should we only modify the critical code?

jerqi avatar Jul 22 '22 02:07 jerqi

Could you add some performance results? Should we only modify the critical code?

Actually i didn't do any performance test. Just found this optimization while I was browsing spark codebase. Maybe we should ping @jerryshao(the author of spark-21745)

zuston avatar Jul 22 '22 03:07 zuston

…me paths

Modify the title and description.

jerqi avatar Jul 22 '22 03:07 jerqi

@zuston From https://github.com/apache/spark/pull/20119, there has performance issue with NIO's Files API. I think read data from shuffle server can be improved with solution as zero-copy?

colinmjj avatar Jul 22 '22 07:07 colinmjj

@zuston From apache/spark#20119, there has performance issue with NIO's Files API.

The performance issue is caused by the default InputStream.skip implementation. So i dont replace it in the uniffle LocalFileReader class.

I think read data from shuffle server can be improved with solution as zero-copy?

After browsing spark related codebase, i think maybe using NIO api of filechannel will improve the performance. But just guess, i have no practice on this and need more test.

zuston avatar Jul 22 '22 08:07 zuston

Any progress? What do i need to do before merging @jerqi

zuston avatar Jul 25 '22 15:07 zuston

Any progress? Do i need to do what before merging @jerqi

I think we need some performance tests to prove the effectiveness of change.

jerqi avatar Jul 25 '22 15:07 jerqi

I think we could directly follow the Spark change. Besides, the performance test looks hard. @jerqi

zuston avatar Jul 26 '22 11:07 zuston

I think we could directly follow the Spark change. Besides, the performance test looks hard. @jerqi

I prefer not merging this pr without test. Maybe we can solve this problem util we encounter the similar, at that time, we have the situation to prove the effect of pr.

jerqi avatar Jul 26 '22 13:07 jerqi