streamx icon indicating copy to clipboard operation
streamx copied to clipboard

java.lang.NoSuchFieldError: INSTANCE exception, caused by http client version mismatch

Open zzbennett opened this issue 8 years ago • 7 comments

I'm trying to get the s3 connector working but I keep running into this exception when I use the S3AFileSystem (I would use NativeS3FileSystem but for reasons I need S3a).

[2017-01-24 01:50:41,622] INFO Couldn't start HdfsSinkConnector: (io.confluent.connect.hdfs.HdfsSinkTask:73)
org.apache.kafka.connect.errors.ConnectException: java.lang.reflect.InvocationTargetException
	at io.confluent.connect.hdfs.storage.StorageFactory.createStorage(StorageFactory.java:40)
	at io.confluent.connect.hdfs.DataWriter.<init>(DataWriter.java:171)
	at io.confluent.connect.hdfs.HdfsSinkTask.start(HdfsSinkTask.java:65)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:221)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:140)
	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
	at io.confluent.connect.hdfs.storage.StorageFactory.createStorage(StorageFactory.java:33)
	... 11 more
Caused by: java.lang.NoSuchFieldError: INSTANCE
	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:144)
	at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.getPreferredSocketFactory(ApacheConnectionManagerFactory.java:87)
	at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:65)
	at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:58)
	at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:51)
	at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:39)
	at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:319)
	at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:303)
	at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:164)
	at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:564)
	at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:544)
	at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:526)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:235)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
	at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:2675)
	at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:418)
	at com.qubole.streamx.s3.S3Storage.<init>(S3Storage.java:49)
	... 16 more

Some cursory googling is indicating that it is related to a version conflict in the apache http core library. I see in /home/ec2-user/streamx/target/streamx-0.1.0-SNAPSHOT-development/share/java/streamx/* there is the jar httpcore-4.2.4.jar. In /usr/bin/../share/java/kafka, which is also on the classpath, there is the jar httpcore-4.4.3.jar. I can try to take a stab at fixing this, but I figured I'd file an issue in case it is a known issue and/or if there is an established work around.

zzbennett avatar Jan 24 '17 02:01 zzbennett

This is actually an issue with the httpclient version. After sleuthing around the classpath and maven dependency tree, it appears that the aws-java-sdk-s3 dependency, which in streamx is currently set at 1.11.69, pulls in httpclient version 4.5.1. It seems aws-java-sdk-s3 actually needs to be downgraded? I'm actually not sure how this is working for other folks. Downgrading aws-java-sdk-s3 to version 1.10.77 pulls in httpclient version 4.3.6 which appears to solve the java.lang.NoSuchFieldError: INSTANCE error, however, a new error appears:

Caused by: java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
	at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:2675)
	at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:418)
	at com.qubole.streamx.s3.S3Storage.<init>(S3Storage.java:49)
	... 16 more

This apparently is a known issue related to an incompatibility with hadoop version 2.7 and aws-java-sdk >1.7. After trying a few different versions of aws-java-sdk-s3, I ended up just deleting the dependency entirely which resolved the issue.

zzbennett avatar Jan 24 '17 19:01 zzbennett

Hi @zzbennett Sorry to respond late. Yes, we have found multiple issues with S3A (thread leak and httpClient related issues). So far, the experience with using NativeS3FileSystem is very stable. Can you try that out instead ?

PraveenSeluka avatar Jan 24 '17 19:01 PraveenSeluka

Thanks for your reply @PraveenSeluka. I'm not able to use NativeS3FileSystem because it doesn't support aws's temporary security credentials, which is what I'm using. There is a ticket open in the hadoop community to add support for temporary security credentials but they have decided not to implement it as s3a already supports it, and (according to the third comment on this thread) they are not planning on making any more enhancements to the s3n connector. So, sadly, s3n will never support temporary security tokens, but I cannot get s3a to work with streamx. The dependency issue appears to have resolved when I deleted the aws-java-sdk-s3 dependency though, so I'm unblocked on that issue for now. I'm still not able to connect to S3 due to an access denied 403 error, so hopefully once that is resolve, things will start working.

zzbennett avatar Jan 24 '17 22:01 zzbennett

@zzbennett You are right. They are not going to add the Roles (temp creds) support in S3N and S3A is the way forward. I will look into this issue and get back soon.

PraveenSeluka avatar Jan 24 '17 22:01 PraveenSeluka

Regarding the S3 403 error, I resolved that by deleting the access_key and secret_key configs from the hadoop hdfs-site.xml config file. Streamx seems to be working smoothly now. Really the only thing I ended up doing was deleting the aws-java-sdk-s3 dependency from the streamx pom.xml file.

zzbennett avatar Jan 24 '17 23:01 zzbennett

Yeah right, you need to remove the keys (or it wont use roles and the keys are invalid). I will add a note for this.

PraveenSeluka avatar Jan 25 '17 00:01 PraveenSeluka

@zzbennett Please look at https://github.com/qubole/streamx/issues/30 for issues related to S3A.

PraveenSeluka avatar Feb 02 '17 04:02 PraveenSeluka