etcd4j icon indicating copy to clipboard operation
etcd4j copied to clipboard

client.get("/").send().get() stuck

Open summershrimp opened this issue 8 years ago • 19 comments

Firstly I use etcd4j version 2.11.0, everything is okay butEtcdKeysResponse resp = etcdManagerClient.get(key).recursive().send().get(); throws exception:Invalid field, cause: invalid value for "recursive", at index: 0 Then I update etcd4j to 2.13.0, everything just got stucked

"Thread-15@8372" prio=5 tid=0x23 nid=NA waiting
  java.lang.Thread.State: WAITING
          at java.lang.Object.wait(Object.java:-1)
          at java.lang.Object.wait(Object.java:502)
          at io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:254)
          at mousio.client.promises.ResponsePromise.waitForPromiseSuccess(ResponsePromise.java:189)
          at mousio.etcd4j.promises.EtcdResponsePromise.get(EtcdResponsePromise.java:58)
          at net.coding.git.service.DiscoveryService$Discovery.run(DiscoveryService.java:120)

"Thread-14@8373" prio=5 tid=0x22 nid=NA waiting
  java.lang.Thread.State: WAITING
          at java.lang.Object.wait(Object.java:-1)
          at java.lang.Object.wait(Object.java:502)
          at io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:254)
          at mousio.client.promises.ResponsePromise.waitForPromiseSuccess(ResponsePromise.java:189)
          at mousio.etcd4j.promises.EtcdResponsePromise.get(EtcdResponsePromise.java:58)
          at net.coding.git.service.DiscoveryService$HeartBeat.run(DiscoveryService.java:179)
{
  "etcdserver": "2.3.7",
  "etcdcluster": "2.3.0"
}

summershrimp avatar Jan 20 '17 08:01 summershrimp

which etcd version ?

lburgazzoli avatar Jan 20 '17 08:01 lburgazzoli

@summershrimp

I've just added a small test which works on my side, are you able to provide a reproducer ?

lburgazzoli avatar Jan 20 '17 08:01 lburgazzoli

@summershrimp which version of jackson are you using ? etcd4j requires jackson > 2.8

lburgazzoli avatar Feb 28 '17 14:02 lburgazzoli

@lburgazzoli I have a similar problem. Calling client.getDir(root).recursive().timeout(TIMEOUT_SECS, TimeUnit.SECONDS).send().get().getNode() just hangs and throws timeout exception. Version 2.11 works fine, the problem is with 2.12 and 2.13.

I am running it against etcd Version: 3.1.5

viacheslav-fomin-main avatar Apr 11 '17 20:04 viacheslav-fomin-main

@viacheslav-fomin-main @summershrimp are you able to provide a reproducer ?

There is a small test about recursive usage which is ok so I'm unable to reproduce your issue.

Please check that you have jackson 2.8 in your runtime classpath.

lburgazzoli avatar Apr 12 '17 05:04 lburgazzoli

have the same issue, call send().get() stuck

and using timeout has no effect to this situation

lujiajing1126 avatar May 03 '17 08:05 lujiajing1126

@lujiajing1126 do you have jackson 2.8.x in your classpath ?

lburgazzoli avatar May 03 '17 08:05 lburgazzoli

Same problem here, with jackson 2.8.8.

wegel avatar May 09 '17 14:05 wegel

@wegel does this test work for you ?

lburgazzoli avatar May 09 '17 14:05 lburgazzoli

@lburgazzoli I haven't spent much time testing yet, but that test does work. However, the test tree in that test is very small, and when I simulate a tree with lots of directories and bigger values, I get an io.netty.handler.codec.TooLongFrameException. Setting the frame size on EtcdNettyConfig to something bigger seems to fix the issue (something like new EtcdClient(new EtcdNettyClient(new EtcdNettyConfig().setMaxFrameSize(100 * 100 * 1024))), and this also seems to fix my issue in my actual code. Need more testing though to confirm.

wegel avatar May 09 '17 19:05 wegel

@wegel there is also a test for a huge dir but does not use recursive get, do you mind sending a pr with a test case which would cover your case ? So i can digg into it a little more.

lburgazzoli avatar May 10 '17 07:05 lburgazzoli

I was having this same issue, upgrading Jackson from 2.6.6 to 2.8.8 resolved it for me.

cmdln avatar Jul 07 '17 18:07 cmdln

I am having the same exact issue. I use latest etcd4j on etcd version 2.3.8 The problem seems to be that if you do not specify a RetryPolicy (i did RetryOnce) then it will continue to retry forever. It seems to be a problem with "large" responses (mine is 400kb of json).

When I set the retrypolicy like this client.getDir(etcdClientBuilder.getEtcdDir()).setRetryPolicy(new RetryOnce(200)).send().get()

I got a stacktrace like this one

2017-08-24 14:55:31.109 INFO  [main] mousio.etcd4j.transport.EtcdNettyClient Setting up Etcd4j Netty client

mousio.client.exceptions.PrematureDisconnectException
	at mousio.etcd4j.transport.EtcdResponseHandler.channelUnregistered(EtcdResponseHandler.java:94)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:181)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:167)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:160)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:181)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:167)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:160)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:181)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:167)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:160)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelUnregistered(CombinedChannelDuplexHandler.java:405)

2017-08-24 14:55:31.780 INFO  [main] mousio.etcd4j.transport.EtcdNettyClient Shutting down Etcd4j Netty client	at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
	at io.netty.channel.CombinedChannelDuplexHandler.channelUnregistered(CombinedChannelDuplexHandler.java:200)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:181)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:167)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:160)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelUnregistered(DefaultChannelPipeline.java:1312)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:181)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:167)
	at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:826)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:752)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:445)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
	at java.lang.Thread.run(Thread.java:748)

Is there some kind of maximum file size of the response that makes this fail? This is pretty confusing, as the json etcd responds with is only 400 kb in filesize.

I followed the path in debug in my IDE, and every time etcd actually responds with a success. But for some reason that I cannot find, it just dies and runs a retry.

artheus avatar Aug 24 '17 13:08 artheus

You can configure the fame size like:

EtcdNettyConfig config = new EtcdNettyConfig();
config.setMaxFrameSize(1024 * 1024); // Desired max size
EtcdNettyClient nettyClient = new EtcdNettyClient(config, URI.create("http://localhost:4001")); 
EtcdClient etcdClient = new EtcdClient(nettyClient);

lburgazzoli avatar Aug 24 '17 13:08 lburgazzoli

It seems that the problem is actually the Netty configuration.

private int maxFrameSize = 1024 * 100;

is one of the lines in the mousio.etcd4j.transport.EtcdNettyConfig class. This means that it limits the maximum file size of responses for Netty to 100kb (which is very small)

There is a way to resolve this. Use -Dmousio.etcd4j.maxFrameSize=1048576 (1 Mb) or something like that to increase that limit. You will get a warning about Deprecation of setting the frame size through a system property. But it should resolve your problem! This is very confusing, and I suggest that this should be change to a much larger number, eg. 100Mb or something like that.

Hope this helps all of you!

artheus avatar Aug 24 '17 13:08 artheus

doesn't config.setMaxFrameSize(1024 * 1024); make any difference ?

lburgazzoli avatar Aug 24 '17 13:08 lburgazzoli

@lburgazzoli Your way should work fine! But I think that it should be larger by default. Or the retryPolicy by default should be N times, rather than forever. As you see, I made a pull request for bumping the maxFrameSize to 100mb

artheus avatar Aug 24 '17 13:08 artheus

I was having this same issue, upgrading Jackson from 2.7.2 to 2.8.8 resolved it for me.

addname avatar Oct 18 '17 05:10 addname

With jackson 2.9.2 the same problem occurs. But doing config.setMaxFrameSize(1024 * 1024 * 100); doesn't help :/

With jackson 2.8.6 everything works out of the box, no need to change maxFrameSize.

kpbochenek avatar Nov 03 '17 15:11 kpbochenek