jetty.project icon indicating copy to clipboard operation
jetty.project copied to clipboard

org.eclipse.jetty.client.HttpRequest.send() never returns

Open cowwoc opened this issue 1 year ago • 7 comments

Jetty version(s): 11.0.11

Java version/vendor (use: java -version)

openjdk 18.0.2.1 2022-08-18
OpenJDK Runtime Environment (build 18.0.2.1+1-1)
OpenJDK 64-Bit Server VM (build 18.0.2.1+1-1, mixed mode, sharing)

OS type/version Microsoft Windows [Version 10.0.19044.2006]

Description Do not report security issues here! See Jetty Security Reports.

How to reproduce?

I'm seeing an intermittent situation where invoking org.eclipse.jetty.client.HttpRequest.send() blocks forever even though I'm using the default idle timeout of 30 seconds.

Here is the stack dump: stacktrace.txt

Here is a summary from Intellij:

248 waiting threads::

"worker-6@6744" prio=5 tid=0x29 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1047)
	  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)
	  at org.eclipse.jetty.client.util.FutureResponseListener.get(FutureResponseListener.java:95)
	  at org.eclipse.jetty.client.HttpRequest.send(HttpRequest.java:771)
	  at io.bazari.client.java.util.HttpRequestContext.send(HttpRequestContext.java:129)
	  at io.bazari.collector.splinterlands.processor.DownloadBlock.process(DownloadBlock.java:89)
	  at io.bazari.collector.splinterlands.client.SplinterlandsClient.processNextBlock(SplinterlandsClient.java:105)
	  at io.bazari.collector.splinterlands.client.SplinterlandsClient$$Lambda$7755/0x0000000801682268.run(Unknown Source:-1)
	  at io.bazari.common.exception.WrappingException.wrap(WrappingException.java:73)
	  at io.bazari.collector.splinterlands.client.SplinterlandsClient.lambda$run$0(SplinterlandsClient.java:152)
	  at io.bazari.collector.splinterlands.client.SplinterlandsClient$$Lambda$7753/0x0000000801681e48.run(Unknown Source:-1)
	  at io.bazari.client.java.processor.WorkScheduler.lambda$execute$0(WorkScheduler.java:76)
	  at io.bazari.client.java.processor.WorkScheduler$$Lambda$7754/0x0000000801682058.run(Unknown Source:-1)
	  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	  at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
	  at java.util.concurrent.FutureTask.run(FutureTask.java:-1)
	  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	  at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
	  at java.util.concurrent.FutureTask.run(FutureTask.java:-1)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

8 runnable threads:

"HttpClient-1254@10319" prio=5 tid=0x4e6 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at sun.nio.ch.WEPoll.wait(WEPoll.java:-1)
	  at sun.nio.ch.WEPollSelectorImpl.doSelect(WEPollSelectorImpl.java:111)
	  at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
	  - locked <merged>(a sun.nio.ch.WEPollSelectorImpl)
	  - locked <merged>(a sun.nio.ch.Util$2)
	  at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
	  at org.eclipse.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:180)
	  at org.eclipse.jetty.io.ManagedSelector.select(ManagedSelector.java:187)
	  at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:604)
	  at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:541)
	  at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produceTask(AdaptiveExecutionStrategy.java:446)
	  at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:239)
	  at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:138)
	  at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy$$Lambda$1992/0x000000080115e000.run(Unknown Source:-1)
	  at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:407)
	  at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894)
	  at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038)
	  at java.lang.Thread.run(Thread.java:833)

1 waiting thread:

"HttpClient@61b65d54-scheduler-1@9457" prio=5 tid=0x427 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:341)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:506)
	  at java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3464)
	  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3435)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1623)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1170)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1062)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1122)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

34 waiting threads:

"sshd-SshClient[5b88b8e]-nio2-thread-1@7421" daemon prio=5 tid=0x26a nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:341)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:506)
	  at java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3464)
	  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3435)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1623)
	  at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1062)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1122)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

main thread waiting on:

"main@1" prio=5 tid=0x1 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1047)
	  at java.util.concurrent.Semaphore.acquire(Semaphore.java:318)
	  at io.bazari.client.java.processor.WorkScheduler$FlowControl.beforeTask(WorkScheduler.java:160)
	  at io.bazari.client.java.processor.WorkScheduler.execute(WorkScheduler.java:69)
	  at io.bazari.collector.splinterlands.client.SplinterlandsClient.run(SplinterlandsClient.java:152)
	  at io.bazari.collector.Collector.run(Collector.java:47)
	  at io.bazari.collector.Collector.main(Collector.java:28)

1 runnable thread:

"logback-1@1586" daemon prio=5 tid=0x1a nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1182)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1062)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1122)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

2 waiting threads:

"Thread-1002@8083" daemon prio=5 tid=0x41c nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1182)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1062)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1122)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

1 waiting thread:

"scheduler-0@18576" prio=5 tid=0x83a nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1182)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1062)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1122)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

996 waiting threads:

"HttpClient-1053@9359" prio=5 tid=0x41d nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672)
	  at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:219)
	  at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:975)
	  at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1025)
	  at java.lang.Thread.run(Thread.java:833)

1 waiting thread:

"Common-Cleaner@23678" daemon prio=8 tid=0x15 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
	  at jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:140)
	  at java.lang.Thread.run(Thread.java:833)
	  at jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:162)

1 runnable thread:

"Thread-1@6701" daemon prio=5 tid=0x22 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at sun.nio.ch.Iocp.getQueuedCompletionStatus(Iocp.java:-1)
	  at sun.nio.ch.Iocp$EventHandlerTask.run(Iocp.java:323)
	  at java.lang.Thread.run(Thread.java:833)

1 sleeping thread:

"idle-connection-reaper@3799" daemon prio=5 tid=0x1d nid=NA sleeping
  java.lang.Thread.State: TIMED_WAITING
	  at java.lang.Thread.sleep(Thread.java:-1)
	  at software.amazon.awssdk.http.apache.internal.conn.IdleConnectionReaper$ReaperTask.run(IdleConnectionReaper.java:150)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

1 runnable thread:

"Reference Handler@23674" daemon prio=10 tid=0x2 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at java.lang.ref.Reference.waitForReferencePendingList(Reference.java:-1)
	  at java.lang.ref.Reference.processPendingReferences(Reference.java:253)
	  at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:215)

1 waiting thread:

"Finalizer@23675" daemon prio=8 tid=0x3 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
	  at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:176)
	  at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:183)

3 runnable threads:

"Signal Dispatcher@23676" daemon prio=9 tid=0x4 nid=NA runnable
  java.lang.Thread.State: RUNNABLE

As far as I can tell, my worker threads are waiting on org.eclipse.jetty.client.HttpRequest.send() to return and it's not clear what Jetty is waiting on. Any ideas?

cowwoc avatar Sep 14 '22 19:09 cowwoc

A few more points:

  • I cannot share the application because it's commercial.
  • I am using a single HttpClient instance with 1000 different proxies.
  • Each proxy is mapped to a separate SOCKS tunnel that runs over SSH to a different remote computer.

Here is my HttpClient initialization code:

QueuedThreadPool clientExecutor = new QueuedThreadPool(64000000);
clientExecutor.setName(HttpClient.class.getSimpleName());

List<Proxy> proxies = scope.getNetworkProxies();

HttpClient client = new HttpClient();
client.setExecutor(scope.getClientExecutor());
client.setUserAgentField(new HttpField(HttpHeader.USER_AGENT, USER_AGENT));
client.setMaxConnectionsPerDestination(64000000);

List<ProxyConfiguration.Proxy> clientProxies = client.getProxyConfiguration().getProxies();
int proxyNumber = 0;
for (Proxy proxy : proxies)
{
	InetSocketAddress address = (InetSocketAddress) proxy.address();
	Socks4Proxy socksProxy = new Socks4Proxy(new Origin.Address(address.getHostString(), address.getPort()), false);
	TaggedProxy taggedProxy = new TaggedProxy(socksProxy, "proxy" + proxyNumber++);
	clientProxies.add(taggedProxy);
}
try
{
	client.start();
}
catch (Exception e)
{
	throw WrappingException.wrap(e);
}

I doubt I'll be able to reproduce this with Jetty verbose logging enabled because the issue might be timing-related.

cowwoc avatar Sep 14 '22 20:09 cowwoc

Is this HttpClient only using HTTP/1 or is it setup for HTTP/2 as well?

joakime avatar Sep 14 '22 20:09 joakime

@joakime I'm fairly sure I'm only using HTTP/1, but how can I check to be sure?

PS: I tried reproducing the issue with DEBUG logs enabled and was unable to, but it occurs (eventually) if Jetty's logging is set to WARN.

cowwoc avatar Sep 14 '22 20:09 cowwoc

Another thing that comes to mind is that I'm using Intellij's Hotswap feature to modify the application's code without restarting the JVM. Is it possible that hotswap is somehow screwing up Jetty's state?

I'll let you know if I run across this bug without using hotswap.

cowwoc avatar Sep 14 '22 23:09 cowwoc

This looks like a real bug. I've reproduced it without the use of hot swap, even running completely from command-line without any sort of debugging taking place.

How do we go about debugging this further?

cowwoc avatar Sep 19 '22 08:09 cowwoc

In the minutes preceding this hang, I see request response time increases from a few seconds to multiple minutes.

I'm not sure if this is thread starvation or some sort of network throttling introduced by the servers I am hitting.

Is it theoretically possible for a server to keep a connection alive forever yet send no actual data over the wire (as a form of network throttling)? If so, are there any sort of timeouts I can set on the client side that would force the connection to timeout?

UPDATE: Sorry, I see now that this is what HttpClient.setIdleTimeout() is meant to handle and I am using the default value of 30 seconds. Maybe the connection is not totally idle? Maybe the server is responding extremely slow (one character every 29 seconds)? Is there a way to detect/handle that possibility?

cowwoc avatar Sep 20 '22 17:09 cowwoc

I think I found a solution. I registered event listeners with Request.onRequestBegin() and Request.onResponseBegin() that invoke Request.abort() or Response.abort() if the overall request time surpasses some limit. I'm pretty sure that this will solve the aforementioned hangs.

I'll let you know in a couple of days.

cowwoc avatar Sep 20 '22 22:09 cowwoc

Unfortunately, the hang still occurs. I would appreciate your advice on how to best track this back to the source. It's difficult to figure out because the issue is intermittent. Are there any sort of event listeners I should make use of? Any sort of logging I should enable?

Remember, if I enable too many logs the problem seems to go away... so it's probably some sort of race condition.

cowwoc avatar Sep 22 '22 01:09 cowwoc

@joakime @sbordet I managed to reproduce the problem with DEBUG logs enabled. Can one of the committers please work with me to isolate which portions of the log file you want me to share? I've got 16GB worth, so I assume I can't attach the full thing.

cowwoc avatar Sep 23 '22 19:09 cowwoc

Any idea what would cause an HTTP request to stay in the queued state forever? I've got maxConnectionsPerDestination set to Integer.MAX_VALUE and the executor is Executors.newThreadPerTaskExecutor() which again should have no limit. In this case, do you ever expect requests to queue for a long time?

Also, does invoking Request.abort() also abort a response if it's in progress? Or do I need to invoke Response.abort() separately?

Thank you.

cowwoc avatar Sep 28 '22 06:09 cowwoc

@cowwoc I'll look into this. Are your logs 16 GiB compressed? Can you split the compressed file and attach the single split files?

sbordet avatar Sep 28 '22 12:09 sbordet

@cowwoc the DEBUG logs will be great if you can provide them.

My impression is that this has something to do with the SOCKS configuration.

The DNS resolution should be subject to HttpClient.addressResolutionTimeout, but only if it's SocketAddressResolver.Async. Make sure you use that.

After the address is resolved, the connect attempt is subject to the HttpClient.connectTimeout. So again it should not wait indefinitely.

After the connection is created, we have an EndPoint which is subject to HttpClient.idleTimeout.

It could be that we don't handle well the case where the SOCKS proxy just does not reply ever to the client. I'll try to write a test case for this and report back.

What class is TaggedProxy?

sbordet avatar Sep 29 '22 17:09 sbordet

@sbordet Sorry for the delayed response. The original logs I collected did not contain enough information to track the transition from application threads to Jetty's threads and back. As usually happens in such cases, adding logs made the problem harder to reproduce. I'll let you know as soon as I've got updated logs showing this problem.

What's weird is that I've added a scheduled task that invokes Request.abort() after ~30 seconds. You'd think that this would make the problem go away, but that doesn't seem to be the case.

Would synchronous SOCKS address resolution abort correctly when Request.abort() is invoked? Or could it block?

TaggedProxy is a mechanism you suggested in another issue. It enables users to select which SOCKS proxy a particular Request will use. Here is the implementation:

/**
 * Selects a proxy based on the Request's tag.
 */
private static final class TaggedProxy extends ProxyConfiguration.Proxy
{
	private final ProxyConfiguration.Proxy delegate;

	/**
	 * @param delegate the proxy to delegate to
	 * @param tag      the tag associated with the proxy
	 * @throws NullPointerException if {@code host} is null
	 */
	public TaggedProxy(ProxyConfiguration.Proxy delegate, Object tag)
	{
		this(delegate, delegate.getOrigin(), tag);
	}

	/**
	 * @param delegate       the proxy to delegate to
	 * @param delegateOrigin the origin of the delegate
	 * @param tag            the tag associated with the proxy
	 * @throws NullPointerException if {@code delegate} or {@code delegateOrigin} are null
	 */
	private TaggedProxy(ProxyConfiguration.Proxy delegate, Origin delegateOrigin, Object tag)
	{
		super(new Origin(delegateOrigin.getScheme(), delegateOrigin.getAddress(), tag,
			delegateOrigin.getProtocol()), delegate.getSslContextFactory());
		this.delegate = delegate;
	}

	@Override
	public boolean matches(Origin origin)
	{
		return super.matches(origin) && Objects.equals(origin.getTag(), getOrigin().getTag());
	}

	@Override
	public ClientConnectionFactory newClientConnectionFactory(ClientConnectionFactory connectionFactory)
	{
		return delegate.newClientConnectionFactory(connectionFactory);
	}

	@Override
	public Origin.Address getAddress()
	{
		return delegate.getAddress();
	}

	@Override
	public boolean isSecure()
	{
		return delegate.isSecure();
	}

	@Override
	public SslContextFactory.Client getSslContextFactory()
	{
		return delegate.getSslContextFactory();
	}

	@Override
	public Origin.Protocol getProtocol()
	{
		return delegate.getProtocol();
	}

	@Override
	public Set<String> getIncludedAddresses()
	{
		return delegate.getIncludedAddresses();
	}

	@Override
	public Set<String> getExcludedAddresses()
	{
		return delegate.getExcludedAddresses();
	}

	@Override
	public URI getURI()
	{
		return delegate.getURI();
	}
}

cowwoc avatar Sep 29 '22 18:09 cowwoc

@sbordet Got it!

  1. I ran a debug session in Intellij with Jetty DEBUG level logs enabled.
  2. The application ran the following code in some grandiose loop:
AtomicReference<RequestState> requestState = new AtomicReference<>();
try
{
request.onRequestQueued(theRequest ->
{
	requestState.set(RequestState.REQUEST_QUEUED);
	log.info("[{}] requestQueued, tag: {}", System.identityHashCode(theRequest), theRequest.getTag());
});
request.onRequestBegin(theRequest ->
{
	requestState.set(RequestState.REQUEST_BEGIN);
	log.info("[{}] requestBegin, tag: {}", System.identityHashCode(theRequest), theRequest.getTag());
});
request.onRequestContent((theRequest, content) ->
{
	requestState.set(RequestState.REQUEST_CONTENT);
	log.info("[{}] requestContent, tag: {}", System.identityHashCode(theRequest), theRequest.getTag());
});
request.onRequestHeaders(theRequest ->
{
	requestState.set(RequestState.REQUEST_HEADERS);
	log.info("[{}] requestHeaders, tag: {}", System.identityHashCode(theRequest), theRequest.getTag());
});
request.onRequestSuccess(theRequest ->
{
	requestState.set(RequestState.REQUEST_SUCCESS);
	log.info("[{}] requestSuccess, tag: {}",
		System.identityHashCode(theRequest), theRequest.getTag());
});
request.onRequestFailure((theRequest, cause) ->
{
	requestState.set(RequestState.REQUEST_FAILURE);
	log.info("[{}] requestFailure, tag: {}",
		System.identityHashCode(theRequest), theRequest.getTag(), cause);
});
request.onRequestCommit(theRequest ->
{
	requestState.set(RequestState.REQUEST_COMMIT);
	log.info("[{}] requestCommit, tag: {}", System.identityHashCode(theRequest), theRequest.getTag());
});
request.onResponseBegin(theResponse ->
{
	requestState.set(RequestState.RESPONSE_BEGIN);
	log.info("[{}] ResponseBegin, tag: {}", System.identityHashCode(theResponse.getRequest()),
		theResponse.getRequest().getTag());
});
request.onResponseContent((theResponse, content) ->
{
	requestState.set(RequestState.RESPONSE_CONTENT);
	log.info("[{}] responseContent, tag: {}", System.identityHashCode(theResponse.getRequest()),
		theResponse.getRequest().getTag());
});
request.onResponseHeaders(theResponse ->
{
	requestState.set(RequestState.RESPONSE_HEADERS);
	log.info("[{}] responseHeaders, tag: {}", System.identityHashCode(theResponse.getRequest()),
		theResponse.getRequest().getTag());
});
request.onResponseSuccess(theResponse ->
{
	requestState.set(RequestState.RESPONSE_SUCCESS);
	log.info("[{}] responseSuccess, tag: {}", System.identityHashCode(theResponse.getRequest()),
		theResponse.getRequest().getTag());
});
request.onResponseFailure((theResponse, failure) ->
{
	requestState.set(RequestState.RESPONSE_FAILURE);
	log.info("[{}] responseFailure, tag: {}", System.identityHashCode(theResponse.getRequest()),
		theResponse.getRequest().getTag(), failure);
});
request.onComplete(result ->
{
	requestState.set(RequestState.COMPLETE);
	log.info("[{}] complete, tag: {}", System.identityHashCode(result.getRequest()),
		result.getRequest().getTag(), result.getFailure());
});
response = request.send();

[...]

private enum RequestState
{
	REQUEST_QUEUED,
	REQUEST_BEGIN,
	REQUEST_CONTENT,
	REQUEST_HEADERS,
	REQUEST_SUCCESS,
	REQUEST_FAILURE,
	REQUEST_COMMIT,
	RESPONSE_BEGIN,
	RESPONSE_CONTENT,
	RESPONSE_HEADERS,
	RESPONSE_SUCCESS,
	RESPONSE_FAILURE,
	COMPLETE
}
  1. Eventually I got the error I was looking for: 14:38:08.226 [main] INFO i.b.c.j.p.WorkScheduler$FlowControl.beforeTask() - Waiting to acquire a permit: FlowControl.WorkScheduler
  2. I paused the debugging session and took a thread dump.
  3. There were 6 threads that were blocked on HttpRequest.send().
  4. I jumped into the stack of each one and checked the value of requestState. They were all in the REQUEST_SUCCESS state.
  5. Assuming that the next event after onRequestSuccess() is onResponseBegin() it means that something is preventing the transition.

I am attaching the thread dump for your review. The log output is 47GB uncompressed. Are you able to extract RAR or 7ZIP formats if I use them instead of classic zip? The RAR compressed size should be around 2GB.

thread-dump.txt

cowwoc avatar Sep 29 '22 19:09 cowwoc

The DNS resolution should be subject to HttpClient.addressResolutionTimeout, but only if it's SocketAddressResolver.Async.

It looks like this is the default so I should be good: https://github.com/eclipse/jetty.project/blob/e3b0e80857cd3f2218f925702b22466a0c4e8bf7/jetty-client/src/main/java/org/eclipse/jetty/client/HttpClient.java#L211

It could be that we don't handle well the case where the SOCKS proxy just does not reply ever to the client. I'll try to write a test case for this and report back.

That, or the server itself could be hanging the response forever as a form of throttling.

Were you successful in reproducing this in a testcase?

cowwoc avatar Sep 29 '22 21:09 cowwoc

Looking at https://github.com/apache/mina-sshd/blob/65b80b57c9a9b474e8b42a6db6908425e92680a6/sshd-core/src/main/java/org/apache/sshd/core/CoreModuleProperties.java#L280 and https://github.com/apache/mina-sshd/blob/65b80b57c9a9b474e8b42a6db6908425e92680a6/sshd-core/src/main/java/org/apache/sshd/core/CoreModuleProperties.java#L338 it looks like the default sshd read timeouts are infinity.

If the server does not send a reply then sshd will never time out and it is left to Jetty to enforce some sort of timeout. I assume you already handle this case if there is no proxy...? If so, I would expect the same timeout configuration option to get used for when a proxy exists.

cowwoc avatar Sep 30 '22 14:09 cowwoc

I might have missed it, but how is SSH involved here? You open an SSH tunnel and you send SOCKS through that SSH tunnel?

Also, can you clarify, when you send the request, if you set a total timeout, an idle timeout and whether you use the blocking send() API?

sbordet avatar Sep 30 '22 14:09 sbordet

@cowwoc can you build and try the changes in branch jetty-10.0.x-8584-socks-idle-timeout? They should apply cleanly to Jetty 11.

sbordet avatar Sep 30 '22 14:09 sbordet

I might have missed it, but how is SSH involved here? You open an SSH tunnel and you send SOCKS through that SSH tunnel?

Correct.

Also, can you clarify, when you send the request, if you set a total timeout, an idle timeout and whether you use the blocking send() API?

I couldn't find a mechanism to enforce total timeout in Jetty (let me know if I missed it) so I implemented my own. I schedule Request.abort() to run sometime in the future. If send() returns first, it cancels the event. Otherwise I am assuming the pending request/response should abort correctly.

I did not set an idle timeout before running into this bug. After I did, I decided to set a total-request timeout instead because a malicious server could game idleTimeout by sending bytes ~30 seconds apart. If the request or response body is very long the server could throttle the client for a very long period of time.

Yes, I am using the blocking send() API.

I'll let you know if jetty-10.0.x-8584-socks-idle-timeout works. Last time it took 2 hours to reproduce the hang so I'll only be able to get back to you in a couple of hours.

cowwoc avatar Sep 30 '22 14:09 cowwoc

FTR, request.timeout(...) enforces a total timeout on the request/response cycle, no need to write your own scheduling of Request.abort().

sbordet avatar Sep 30 '22 15:09 sbordet

@sbordet I'm 1:15 hours into the test and so far so good. Maybe it's a coincidence, but the transfer rates I'm getting now are way better than before... If more and more connections were converted into zombies in the past, it's conceivable that this would reduce the overall transfer speed.

Assuming that this fix works, is there a workaround I can use in production until you publish an official release? Or am I dependent on this private build?

cowwoc avatar Sep 30 '22 17:09 cowwoc

~Nevermind. I just got a hang :(~ There is a chance the test was using the wrong Jetty build. I am going to clean/build and try again.

cowwoc avatar Sep 30 '22 17:09 cowwoc

@sbordet I think your fix works. I left the application running for 5 hours and it didn't hang once.

Going back to my earlier question: is there a workaround that I can use in production until the fix makes it into an official release? I assume the next release is 3 months away which is an eternity for me :)

cowwoc avatar Sep 30 '22 23:09 cowwoc

@cowwoc I think that if you set a total timeout to your requests you can workaround the immediate problem of requests never returning. However, the proper fix would work better, so you'll have to wait for the next release.

sbordet avatar Oct 03 '22 09:10 sbordet

@cowwoc do you confirm that the changes fix this issue?

sbordet avatar Oct 04 '22 13:10 sbordet

@sbordet Yes. It looks fixed to me. Thank you.

cowwoc avatar Oct 04 '22 18:10 cowwoc

@sbordet Apologies, but I think this bug is not fixed. Should I open a new bug report or do you prefer re-opening this ticket?

To clarify, I checked out the v11 branch (commit 77ceb4172dde2e6c0c419f378b91126ce1e8e4a0), built a local SNAPSHOT version and used that to run my application.

I ended up with a hang which again seems centered around send() never returning, in spite of a request timeout being set. I am attaching an updated thread dump for your review. Please note that I using the new thread dump format that is exported by jcmd pid Thread.dump_to_file because it is the only way to retrieve the stack trace of virtual threads.

thread-dump.txt

For example, take a look at Thread #71683 "BlockingTask-71427" virtual.

Also, you should note that I discovered a JDK bug a few days back that might be relevant: https://youtrack.jetbrains.com/issue/IDEA-305072

The above hang occurred while running my application under IDEA in debug mode. I have never seen this bug affect virtual threads before, but anything is possible...

Let me know if you need me to collect any other logs, and what you think of the thread dump.

cowwoc avatar Nov 07 '22 06:11 cowwoc

@cowwoc, first of all virtual threads are experimental, preview, etc. -- could be an OpenJDK bug. The stack trace alone unfortunately is not enough to diagnose any problem.

Does it happen when you run without virtual threads?

Do you have DEBUG logs for this failure?

sbordet avatar Nov 08 '22 14:11 sbordet

@sbordet I think you're right. Upon digging further into the issue I noticed that the problem went away if I switched from thread-per-request to a fixed-size thread pool. Eventually this led me to the discovery of a memory leak in the JDK as well as in IDE I was using. Here are the relevant links in case anyone else is interested:

https://youtrack.jetbrains.com/issue/IDEA-305072 https://bugs.openjdk.org/browse/JDK-8296463

The JDK issues have been fixed. Once IDEA fixes their memory leak I will retest and hopefully the hang will disappear.

cowwoc avatar Nov 23 '22 11:11 cowwoc