hadoop-connectors
hadoop-connectors copied to clipboard
Performance degradation when upgrading 3-2.2.8
When upgrading the from hadoop3-1.9.17 to hadoop3-2.2.8 (using the shaded jar of the new version) I faced performance degradation almost doubling the time of my tests.
I also created this Stackoverflow question
I have a performance test case which I run on my fileSystem implementation which uses org.apache.hadoop.fs.FileSystem the test runs several operations [create, read, write, rename, checkIfExists, mkDir] on 100 files with multiple threads.
I ran same tests several time on both versions of the Hadoop connectors and the new [2.2.8] is showing overall slower execution time (almost 2-2.2X the old connector time).
Below is a comparison between the average execution time for each operation while using each connector version:
operation, hadoop3-1.9.17, hadoop3-2.2.8
READ 4542.71, 10171.26, (X2 old)
RENAME 1347.75, 4483.27, (X4 old)
EXISTS 47.23, 1538.74, (X50 old)
CREATE 570.1, 1539.81, (X3 old)
I have checked this github issue & tried to follow the recommendation to fine tune the performance using the configs/params but failed to find any improvement.
Is there any guidelines on parameter configurations to improve the above operations time?
Or might this performance issue be due to some incompatibility in my class-path jars? Even though I am using the shaded jar can other jars interfere?
Here is a list of jars I have in my class path:
- gcs-connector-hadoop3-2.2.8-shaded.jar
- google-extensions-0.7.1.jar
- google-api-client-1.32.2.jar
- google-http-client-apache-v2-1.40.1.jar
- proto-google-common-protos-2.7.3.jar
- google-http-client-1.41.8.jar
- google-oauth-client-1.33.3.jar
- google-http-client-jackson2-1.40.1.jar
- grpc-google-cloud-storage-v2-2.2.2-alpha.jar
- google-http-client-gson-1.41.8.jar
- google-cloud-monitoring-1.82.0.jar
- google-cloud-core-http-2.5.4.jar
- proto-google-cloud-storage-v2-2.2.2-alpha.jar
- google-api-client-jackson2-1.32.2.jar
- google-api-services-iamcredentials-v1-rev20210326-1.32.1.jar
- google-oauth-client-java6-1.27.0.jar
- google-cloud-core-grpc-2.5.4.jar
- google-http-client-appengine-1.34.2.jar
- google-cloud-core-2.5.4.jar
- google-auth-library-credentials-1.7.0.jar
- google-cloud-storage-1.106.0.jar
- proto-google-iam-v1-1.2.3.jar
- google-api-services-storage-v1-rev20211018-1.32.1.jar
- google-auth-library-oauth2-http-1.7.0.jar
- proto-google-cloud-monitoring-v3-1.64.0.jar
- grpc-services-1.43.2.jar
- grpc-netty-shaded-1.43.2.jar
- grpc-alts-1.43.2.jar
- grpc-stub-1.43.2.jar
- grpc-census-1.43.2.jar
- grpc-protobuf-1.43.2.jar
- grpc-api-1.43.2.jar
- grpc-xds-1.43.2.jar
- grpc-core-1.43.2.jar
- grpc-protobuf-lite-1.43.2.jar
- grpc-context-1.43.2.jar
- opencensus-contrib-grpc-metrics-0.31.0.jar
- grpc-auth-1.43.2.jar
- gax-grpc-2.7.1.jar
- grpc-grpclb-1.43.2.jar
- api-common-2.1.4.jar
- gax-2.7.1.jar
- gax-httpjson-0.73.0.jar
- util-2.2.8.jar
- util-hadoop-hadoop3-2.2.8.jar
- auto-value-annotations-1.9.jar
My File class which has methods like write, read ...etc
class File {
private String path;
private FileSystem fs;
}
Here is how my write method is implemented
@Override
public OutputStream write(boolean overwriteIfExists) throws IOException {
return fs.create(path, overwriteIfExists);
}
And my read method:
@Override
public InputStream read() throws IOException {
return fs.open(path);
}
My test case simply creates many threads each has different a different instance of a file object which has different path (path to a unique GCS bucket object, path i.e gs://some-bucket/objectX) and then do read operation in example.