pulsar [Broker] Increase default numHttpServerThreads value to 50 to prevent Admin API unavailability

Motivation

Since Pulsar Admin API uses the blocking servlet API, it is possible that all Jetty threads are occupied and this causes unavailability on the Pulsar Admin API. The default value for the maximum number of threads for Jetty is too low in Pulsar. That is the root cause of many problems where Pulsar Admin API is unavailable when all threads are in use.

Additional context

Mailing list thread about "make async" changes: https://lists.apache.org/thread/tn7rt59cd1k724l4ytfcmzx1w2sbtw7l

Related issues/PRs #13666 #4756 #10619
PIP-142

Modification

Jetty defaults to 200 maximum threads, to prevent thread pool starvation. Make Pulsar use the value of 50 maximum threads by setting numHttpServerThreads=50.
Update the documentation for numHttpServerThreads

Feb 16 '22 13:02 lhotari

What's the reason of setting the default value to 200? If the node just have one core, what will happen? Please send email to dev mail list to discuss.

These are threads. Jetty defaults to 200 maximum threads, to prevent thread pool starvation. This is recommended when using blocking Servlet API. The problem is that Pulsar uses the blocking servlet API and doesn't have a sufficient amount of threads which are needed and recommended.

The value 200 doesn't mean that there will be 200 threads to start with. This is the maximum size for the thread pool. When the value is more than 8, Jetty will start with 8 initial threads and add more threads to the pool when all threads are occupied.

I have already started an email discussion to discuss this topic. Please reply to https://lists.apache.org/thread/byg1g081o6mfj0xn8ntryvb5qplmrjyl .

There is useful background information in https://lists.apache.org/thread/hso8qwsv40ccrk116fj5ggdpt3b9d4g4 . I wrote that reply before I noticed Penghui's response. It contains a link to Jetty's documenation about asynchronous servlets: https://wiki.eclipse.org/Jetty/Feature/Continuations#Why_Asynchronous_Servlets_.3F .

Feb 16 '22 14:02 lhotari

These are threads. Jetty defaults to 200 maximum threads,

@lhotari If this is jetty defaults. Can we just leave it blank?

Feb 16 '22 14:02 Jason918

Change the default configuration need to start with a proposal

This PR is a proposal. I have also made this proposal on the dev mailing list in the discussion. https://lists.apache.org/thread/byg1g081o6mfj0xn8ntryvb5qplmrjyl . What else is needed?

In this case, the previous default for numHttpServerThreads is simply too small and invalid when blocking servlet API is used. The value 200 doesn't mean that there will be 200 threads to start with. This is the maximum size for the thread pool. When the value is more than 8, Jetty will start with 8 initial threads and add more threads to the pool when all threads are occupied.

There is no breaking change in increasing the default value to 200. It's just an improvement and fixes "the problem" where Admin API goes unresponsive when all threads are occupied.

We might end up setting the default value to something lower than 200. A value like 50 or 100 might be fine. I just think that 200 is a good default since Jetty also uses that as the default value.

The main overhead of a thread is the amount of memory that the stack of each thread consumes. It's 1MB by default. 200 threads will consume 200MB of RSS memory in the thread stacks.

Feb 16 '22 14:02 lhotari

@lhotari If this is jetty defaults. Can we just leave it blank?

@Jason918 no. Pulsar overrides the default with the value set in numHttpServerThreads in the configuration.

Pulsar code locations:

https://github.com/apache/pulsar/blob/b540523b474e4194e30c1acab65dfafdd11d3210/pulsar-broker/src/main/java/org/apache/pulsar/broker/web/WebService.java#L79-L81

https://github.com/apache/pulsar/blob/adcbe0f118ece0999b8603f37010194b44c241b4/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/web/WebExecutorThreadPool.java#L33-L36

Feb 16 '22 14:02 lhotari

These are threads. Jetty defaults to 200 maximum threads

@lhotari Do we need to take the number of availableProcessors into consideration for the maximum threads of the thread pool?

Feb 16 '22 14:02 hangc0276

This doesn't seem to solve the root cause of the issue.

Please tell me what is "the issue" that you are referring to?

This PR is a proposal. I have also made this proposal on the dev mailing list in the discussion. https://lists.apache.org/thread/byg1g081o6mfj0xn8ntryvb5qplmrjyl . What else is needed?

I think we need a PIP to get this change approved.

The Pulsar community makes major decisions on the dev mailing list according to the Apache Way. The mailing list is the place to decide whether this change needs a PIP or not. Please participate in the existing dev mailing list discussion: https://lists.apache.org/thread/byg1g081o6mfj0xn8ntryvb5qplmrjyl

Feb 16 '22 14:02 lhotari

@lhotari @eolivelli I suggest that we should consider the system cpu cores, it may be hurtful change for people who run pulsar in a low machine, like one cpu core.

Feb 16 '22 14:02 hezhangjian

The Pulsar community makes major decisions on the dev mailing list according to the Apache Way. The mailing list is the place to decide whether this change needs a PIP or not. Please participate in the existing dev mailing list discussion: https://lists.apache.org/thread/byg1g081o6mfj0xn8ntryvb5qplmrjyl

For changing the default configuration, need to approve through a PIP, here are some good examples https://github.com/apache/pulsar/issues?q=is%3Aissue+label%3APIP+assignee%3Amerlimat+is%3Aclosed

And please use a separate thread to discuss, let's focus on the 2.10.0 release on the 2.10.0 release thread. It's not a breaking change introduced in 2.10.0, not a blocker in 2.10.0

Feb 16 '22 14:02 codelipenghui

These are threads. Jetty defaults to 200 maximum threads

@lhotari Do we need to take the number of availableProcessors into consideration for the maximum threads of the thread pool?

No. Jetty is different from Netty in this aspect. In Netty, everything should be asynchronous and "thou shall never block". In Jetty, the maximum number of threads for the thread pool should be set to 50-500 threads and blocking operations are fine.

The recommendation for the thread pool is explained in Jetty documentation https://www.eclipse.org/jetty/documentation/jetty-9/index.html#_thread_pool

Thread Pool Configure with goal of limiting memory usage maximum available. Typically this is >50 and <500

In Jetty there's also a "acceptor" and "selector" thread count settings. These have been fixed to 1 in Pulsar.

Acceptors The standard rule of thumb for the number of Accepters to configure is one per CPU on a given machine.

@Shoothzj The number of acceptors should take the system cpu cores into account. However that is a different setting and Pulsar now uses a fixed number there: 1.

http port: https://github.com/apache/pulsar/blob/b540523b474e4194e30c1acab65dfafdd11d3210/pulsar-broker/src/main/java/org/apache/pulsar/broker/web/WebService.java#L88

https port: https://github.com/apache/pulsar/blob/b540523b474e4194e30c1acab65dfafdd11d3210/pulsar-broker/src/main/java/org/apache/pulsar/broker/web/WebService.java#L125

acceptors and selectors should be both set to -1. Jetty would pick the recommended count based on cores in that case.

Feb 16 '22 14:02 lhotari

@lhotari @eolivelli I suggest that we should consider the system cpu cores, it may be hurtful change for people who run pulsar in a low machine, like one cpu core.

no it doesn't hurt performance. I explained this in my reply above.

Feb 16 '22 15:02 lhotari

For changing the default configuration, need to approve through a PIP, here are some good examples https://github.com/apache/pulsar/issues?q=is%3Aissue+label%3APIP+assignee%3Amerlimat+is%3Aclosed

Thanks, I'll create a PIP.

And please use a separate thread to discuss, let's focus on the 2.10.0 release on the 2.10.0 release thread. It's not a breaking change introduced in 2.10.0, not a blocker in 2.10.0

I'll start a new thread.

Feb 16 '22 15:02 lhotari

Thread Pool Configure with goal of limiting memory usage maximum available. Typically this is >50 and <500

It makes sense to me. But I don't think it should be a blocker for 2.10.0 release. I found 2.10.0 release has been delayed for some time. The previous 2.9.0 release is also a bad case.

We shouldn't add some temporary PRs during the release phase unless they are really important.

Feb 16 '22 15:02 BewareMyPower

@codelipenghui PIP-142 has been published. Please follow up on the mailing list.

Feb 16 '22 17:02 lhotari

We shouldn't add some temporary PRs during the release phase unless they are really important.

@BewareMyPower This PR contains a very important change. The recommended maximum thread count for Jetty thread pool is 50-500. Pulsar current uses an invalid valid. The sync->async changes in the Pulsar Admin API are not needed when a proper value is used.

Feb 16 '22 17:02 lhotari

We shouldn't add some temporary PRs during the release phase unless they are really important.

@BewareMyPower This PR contains a very important change. The recommended maximum thread count for Jetty thread pool is 50-500. Pulsar current uses an invalid valid.

@lhotari This is not a new thing and it's something that can easily be changed by users. I don't think we need to rush into a release.

The sync->async changes in the Pulsar Admin API are not needed when a proper value is used.

The problem is not just keeping threads busy but the cases in which you have calls that spawn to multiple brokers. In this cases, the HTTP call can come back to same broker and we can have a deadlock, no matters how many threads we have in the pool.

Feb 16 '22 18:02 merlimat

@codelipenghui https://github.com/apache/pulsar/issues/14329 has been published. Please follow up on the mailing list.

Sure

Feb 17 '22 01:02 codelipenghui

We shouldn't add some temporary PRs during the release phase unless they are really important.

@BewareMyPower This PR contains a very important change. The recommended maximum thread count for Jetty thread pool is 50-500. Pulsar current uses an invalid valid. The sync->async changes in the Pulsar Admin API are not needed when a proper value is used.

I think I need to explain more for the important term I used. IMO, a PR that could block a release during the release phase must match following rules:

It must be a bug fix.
The bug was introduced from the current release, i.e. it's a regression.
There is no workaround.

It's only my opinion. I think our release document for release manager missed something like this.

Let's look back to this PR. First, I don't think a change to the default configuration value can be treated as a bug fix. It's more like an enhancement. Because the previous stable releases all should have the same problem. Then, we can see it's not a regression. Third, it's not something serious like Log4j2 Vulnerability (CVE-2021-44228). It just make some certain cases not work for Admin API and can be fixed by configuration tuning.

In short, IMO, after a release started, we must be very careful and strict on the new PRs.

Feb 17 '22 03:02 BewareMyPower

The unavailability of the Admin API is not caused by the HTTP server thread, the root cause is that the ZK callback thread is blocked.

When an admin API calls the ZK metadatastore API, it gets the ZK data by call the CompletableFuture, note that we did not use the executor to execute the CompletableFuture#complete() in ZKMetadataStore.java#L171. In ZK callback thread, once the caller converts async to sync calls then the ZK callback thread will be blocked, this code so like: metadata.getAsync().get(30, TimeUnit.SECONDS).

How to solve this problem?

Use an executor to execute the callback that passes data to Pulsar in ZK callback
Don't convert async to sync calls, so there are make some PR that converts sync to async calls

How to reproduce the ZK callback thread is blocked:

docker run -d -p 2181:2181 --name test-zookeeper zookeeper

public class Main {
    private static final long CACHE_REFRESH_TIME_MILLIS = TimeUnit.MINUTES.toMillis(5);

    public static void printThread(String name) {
        System.out.println(name + " thread name -> " + Thread.currentThread().getName());
    }

    public static void main(String[] args) throws Exception {
        ZooKeeper zkc = new ZooKeeper("localhost:2181", 60_000, null);

        System.out.println("Check the zk connect");
        CountDownLatch zkLatch = new CountDownLatch(1);
        new Thread(() -> {
            while (true) {
                if (zkc.getState().isConnected()) {
                    zkLatch.countDown();
                    break;
                }
            }
        }).start();
        if (!zkLatch.await(5, TimeUnit.SECONDS)) {
            throw new Exception("zk connect failed");
        }

        AsyncLoadingCache<String, byte[]> objCache = Caffeine.newBuilder()
                .refreshAfterWrite(CACHE_REFRESH_TIME_MILLIS, TimeUnit.MILLISECONDS)
                .buildAsync((key, executor) -> {
                    CompletableFuture<byte[]> future = new CompletableFuture<>();
                    zkc.multi(Lists.newArrayList(Op.getData("/")), (rc, path, ctx, opResults) -> {
                        printThread("zk callback");
                        future.complete(null);
                    }, null);
                    return future;
                });

        CountDownLatch countDownLatch = new CountDownLatch(1);

        // Reproduce the ZK callback is blocked
        System.out.println("async get start");
        objCache.get("/").whenComplete((unused, ignored) -> {
            printThread("async get done");
            try {
                System.out.println("zk thread will blocked after sync get");
                System.out.println("sync get start");
                objCache.get("/1").get(5, TimeUnit.SECONDS);
                // Unreachable
                printThread("sync get done");
                countDownLatch.countDown();
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                countDownLatch.countDown();
            }
        });

        countDownLatch.await();
    }
}

Feb 17 '22 08:02 nodece

When an admin API calls the ZK metadatastore API, it gets the ZK data by call the CompletableFuture, note that we did not use the executor to execute the CompletableFuture#complete() in ZKMetadataStore.java#L171. In ZK callback thread, once the caller converts async to sync calls then the ZK callback thread will be blocked, this code so like: metadata.getAsync().get(30, TimeUnit.SECONDS).

The blocked thread in #13666 is a HTTP server thread.

"pulsar-web-40-28" #238 prio=5 os_prio=0 tid=0x00007f5a4000d800 nid=0x2bcf waiting on condition [0x00007f5961d3b000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000005c5529d40> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalDeleteSubscriptionForNonPartitionedTopic(PersistentTopicsBase.java:1498)

I'll clarify what I have been referring to as "sync -> async" changes: changes where the use of the blocking Servlet API is migrated to use Asynchronous Servlet API. I understand that it's necessary to not block in Zookeeper callbacks, but that is a different problem, which isn't related to Servlet API change.

Feb 17 '22 09:02 lhotari

@BewareMyPower This PR contains a very important change. The recommended maximum thread count for Jetty thread pool is 50-500. Pulsar current uses an invalid valid.

@lhotari This is not a new thing and it's something that can easily be changed by users. I don't think we need to rush into a release.

I agree. Have I been rushing this? Instead, you can say that a lot of PRs with "make async" have been pushed and merged recently. This has introduced several known regressions that have been fixed. We don't know which are regressions that just haven't been found yet.

My valid questions were never answered by the contributors of the "make async" changes.

I'm expecting that there are issues or a PIP which is referred to. @merlimat WDYT?

The sync->async changes in the Pulsar Admin API are not needed when a proper value is used.

The problem is not just keeping threads busy but the cases in which you have calls that spawn to multiple brokers. In this cases, the HTTP call can come back to same broker and we can have a deadlock, no matters how many threads we have in the pool.

I'll clarify: what I have been referring to as "sync -> async" changes: changes where the use of the blocking Servlet API is migrated to use Asynchronous Servlet API. That won't solve any problems on it's own. Any problems that it might solve would be solved also by configuring Jetty as it is recommended to be configured when there are blocking calls involved. The recommended maximum thread pool size is 50 to 500 for Jetty. I have been going through some details and there are multiple other things that are not properly configured. I'll be following up with separate PRs. (UPDATE: the draft PR which fixes backpressure handling is #14353).

I'd assume that the reason for deadlocks when thread pool size is properly configured are caused by locks. I like to see an example of a deadlock which couldn't be resolved by continuing to use the blocking servlet api. I'm not against the changes from blocking API to async API, but I think changes need proper justification, especially when the "make async" changes have been initiated without referring any reported issues or a PIP.

Feb 17 '22 10:02 lhotari

I think I need to explain more for the important term I used. IMO, a PR that could block a release during the release phase must match following rules:

I haven't requested to block the release.

Let's look back to this PR. First, I don't think a change to the default configuration value can be treated as a bug fix. It's more like an enhancement. Because the previous stable releases all should have the same problem. Then, we can see it's not a

The Jetty documentation recommends values 50-500 for the maximum thread pool size. That is a fact, so there cannot be different opinions on this. Since the default configuration value doesn't fall in the recommended value range, my opinion is that this is a bug. For some people a bug is a feature. :) Does it really matter whether we call this a bug or an improvement? The fact is that the current value doesn't fall in the recommended value range.

Feb 17 '22 10:02 lhotari

Instead, you can say that a lot of PRs with "make async" have been pushed and merged recently.

Yeah, I noticed these PRs recently as well. But are these PRs blockers for 2.10.0 release? IMO, they should not be blockers as well. I've thought they are intended to be included in Pulsar 2.11.0.

I haven't requested to block the release.

Sorry, I might missed some context. I just saw this PR in the 2.10.0 release email list. This PR should focus on the fix itself, but the previous discussion might go far for the release issue.

My valid questions were never answered by the contributors of the "make async" changes.

It's a pity to see the lack of communication. AFAIK, @Technoboy- is also preparing for a PIP to make admin APIs async. I think you should have a discussion about:

Whether this PR solve the root problem?
Based on this PR, is making admin APIs async meaningful?

Feb 17 '22 10:02 BewareMyPower

I'll continue the discussion in the PIP-142 discussion email.

Feb 17 '22 11:02 BewareMyPower

My valid questions were never answered by the contributors of the "make async" changes.

It's a pity to see the lack of communication. AFAIK, @Technoboy- is also preparing for a PIP to make admin APIs async. I think you should have a discussion about:

Whether this PR solve the root problem?

Great point @BewareMyPower . I hope that the problem would first be discussed or reported before a PIP is created. @Technoboy- Would you be able to start some discussion even before the PIP is ready?

Based on this PR, is making admin APIs async meaningful?

That's also a valid question to ask. When we work together, we can learn together.

Feb 17 '22 11:02 lhotari

This is not something that can block release. The value is already configurable. I believe that there is no hurry in committing this change, and we can discuss about a new value or decide that the default should not be changed.

Feb 17 '22 16:02 eolivelli

When an admin API calls the ZK metadatastore API, it gets the ZK data by call the CompletableFuture, note that we did not use the executor to execute the CompletableFuture#complete() in ZKMetadataStore.java#L171. In ZK callback thread, once the caller converts async to sync calls then the ZK callback thread will be blocked, this code so like: metadata.getAsync().get(30, TimeUnit.SECONDS).

The blocked thread in #13666 is a HTTP server thread.
"pulsar-web-40-28" #238 prio=5 os_prio=0 tid=0x00007f5a4000d800 nid=0x2bcf waiting on condition [0x00007f5961d3b000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000005c5529d40> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalDeleteSubscriptionForNonPartitionedTopic(PersistentTopicsBase.java:1498)
I'll clarify what I have been referring to as "sync -> async" changes: changes where the use of the blocking Servlet API is migrated to use Asynchronous Servlet API. I understand that it's necessary to not block in Zookeeper callbacks, but that is a different problem, which isn't related to Servlet API change.

When the ZK callback thread is blocked in the WEB thread, another admin API request the ZK metadata store is not working, so you see this thread stack.

Feb 17 '22 16:02 nodece

The pr had no activity for 30 days, mark with Stale label.

Mar 20 '22 01:03 github-actions[bot]

The pr had no activity for 30 days, mark with Stale label.

May 28 '22 02:05 github-actions[bot]

@lhotari - should this old debate on a PR be closed as it was hopefully resolved?

Jul 16 '23 19:07 dave2wave

The Pulsar Admin client doesn't have a limit of how many connections it opens to a single broker. There is issue #22041 for addressing that.

Feb 12 '24 06:02 lhotari