Windkit Li
Windkit Li
The external is a must to me, and the later one should be extra.
## Write Performance Issue The current write to file with `s3a://` does not fit S3-API well in terms of performance For a simple write, there are >3000 operations to LeoFS...
## Write Issue From time to time, the write could be failed ## Logs **Normal** [spark_write.access.log.gz](https://github.com/leo-project/leofs/files/741424/spark_write.access.log.gz) [spark_write.spark.log.gz](https://github.com/leo-project/leofs/files/741425/spark_write.spark.log.gz) **Error** [spark_fail_write.access.log.gz](https://github.com/leo-project/leofs/files/741427/spark_fail_write.access.log.gz) [spark_fail_write.spark.log.gz](https://github.com/leo-project/leofs/files/741426/spark_fail_write.spark.log.gz)
## Read Issue Read requests are issued as Range Get (as tasks could be partitioned to multiple worker) Cache in `leo_gateway` is not used.
## Test with Spark 2.1.0 + Hadoop 3.0.0-alpha2 To test with new version of Hadoop which includes performance improvement of S3 support ## Logs [spark_210_hadoop_3a2_write.access.log.gz](https://github.com/leo-project/leofs/files/762571/spark_210_hadoop_3a2_write.access.log.gz) Number of requests decreases from...
## Small File Testing ** Data Set ** ~3900 Image File (Total: ~170 MB) ## Hadoop Setting 1x Name node (+Secondary Name Node) 3x Data node ### Read from Hadoop...
With a large data set (500 dirs x 1600 files), it took too long to list the number of objects Know Issue: https://github.com/leo-project/leofs/issues/548 It is quite difficult to work with...
@mocchira Yes, that's the way I am trying to work around the bottleneck. Will update here later
Gateway Logs (Added Copy Log at info) [spark_fail.access.txt](https://github.com/leo-project/leofs/files/1118937/spark_fail.access.txt) [spark_fail.info.txt](https://github.com/leo-project/leofs/files/1118938/spark_fail.info.txt) Storage Logs (Added prefix search log at info) [spark_fail.saccess.txt](https://github.com/leo-project/leofs/files/1118936/spark_fail.saccess.txt) [spark_fail.sinfo.txt](https://github.com/leo-project/leofs/files/1118935/spark_fail.sinfo.txt)
- [ ] Investigate Write Problem - [x] Complete Access Log on Gateway (Missing Copy, ...) - [x] Complete Access Log on Storage (Prefix Search, ...) - [x] Fix the...