scale-clojure-web-app icon indicating copy to clipboard operation
scale-clojure-web-app copied to clipboard

How can I beyond 600K?

Open Ranler opened this issue 11 years ago • 9 comments

OS: CentOS6.2 Kernel: 2.6.32-279.14.1.el6.x86_64 RAM: 32GB ECC CPU: Xeon E5645 @2.40GHz * 2 JDK: 1.6.0_31

首先第一次按照http://http-kit.org/600k-concurrent-connection-http-kit.html 的设置

客户端基本正常:

...
time 200s, concurrency: 547579, total requests: 2560038, thoughput: 26.39M/s, 12786.00 requests/seconds
time 201s, concurrency: 547779, total requests: 2580908, thoughput: 26.35M/s, 12809.68 requests/seconds
time 202s, concurrency: 548083, total requests: 2598713, thoughput: 26.33M/s, 12816.88 requests/seconds
time 203s, concurrency: 548558, total requests: 2625809, thoughput: 26.42M/s, 12886.39 requests/seconds
time 205s, concurrency: 548709, total requests: 2631873, thoughput: 26.29M/s, 12799.57 requests/seconds
remote closed cleanly
remote closed cleanly
remote closed cleanly
remote closed cleanly
remote closed cleanly
remote closed cleanly
remote closed cleanly

然后服务端就开始报错:

Sat Mar 30 10:06:45 CST 2013 [server-loop] ERROR - queue size exceeds the limit 20480, please increase :queue-size when run-server if this happens often
java.util.concurrent.RejectedExecutionException
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
        at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:78)
        at org.httpkit.server.RingHandler.handle(RingHandler.java:108)
        at org.httpkit.server.HttpServer.decodeHttp(HttpServer.java:114)
        at org.httpkit.server.HttpServer.doRead(HttpServer.java:168)
        at org.httpkit.server.HttpServer.run(HttpServer.java:239)
        at java.lang.Thread.run(Thread.java:662)
...

第二次,修改main.clj中queue-size

...
(defn -main [& args]
  (run-server (-> handler wrap-keyword-params wrap-params)
              {:port 8000 :queue-size 1024000})
  (println (str "Server started. listen at 0.0.0.0@8000")))

接下来测试正常,客户端:

...
time 353s, concurrency: 597072, total requests: 5530283, thoughput: 32.15M/s, 15656.77 requests/seconds
time 354s, concurrency: 597072, total requests: 5568259, thoughput: 32.16M/s, 15714.85 requests/seconds
time 355s, concurrency: 597072, total requests: 5586378, thoughput: 32.30M/s, 15720.95 requests/seconds
time 356s, concurrency: 597072, total requests: 5604011, thoughput: 32.34M/s, 15724.11 requests/seconds
time 357s, concurrency: 597072, total requests: 5629155, thoughput: 32.38M/s, 15746.50 requests/seconds
time 358s, concurrency: 597072, total requests: 5651914, thoughput: 32.44M/s, 15765.80 requests/seconds

第三次,调整ConcurrencyBench.java下每个IP并发数:

 final static int PER_IP = 25000

基本到了660K就上不去了,一直Connection timed out:

...
time 338s, concurrency: 664873, total requests: 4863842, thoughput: 26.50M/s, 14386.49 requests/seconds
time 339s, concurrency: 664871, total requests: 4867369, thoughput: 26.43M/s, 14354.13 requests/seconds
time 340s, concurrency: 664870, total requests: 4871330, thoughput: 26.35M/s, 14323.02 requests/seconds

现在瓶颈在CPU? 或者通过增加IP地址来提高并发?

Ranler avatar Mar 30 '13 03:03 Ranler

你的机器真舒服。 应该轻松过百万,调一下TCP参数,轻松过200万,把测试客户端移到另外的机器上,300万可以达到。

建议:

  1. 升级到JDK7
    public static int randidelTime() {
//  把这个数改大
        int ms = 5000 + r.nextInt(45000); // 5s ~ 50s
        return ms;
    }
         if (opened < CONCURENCY) {
// 可以试着调大这个数
                Thread.sleep(20); // open 5000 per seconds most
            }
// 单个IP 可到6万左右
    final static int PER_IP = 20000;
    final static InetSocketAddress ADDRS[] = new InetSocketAddress[30];
    // 600k concurrent connections
    final static int CONCURENCY = PER_IP * ADDRS.length;

    static {
        // for i in `seq 200 240`; do sudo ifconfig eth0:$i 192.168.1.$i up ; done
        final int PORT = 8000;
        final int IP_START = 200;

shenfeng avatar Mar 30 '13 04:03 shenfeng

学校的小集群,呵呵,计划极限的测一下试试,给http-kit当个测试样例。我把客户端移走试试,不知网络IO会不会成瓶颈。

Ranler avatar Mar 30 '13 05:03 Ranler

我刚改了一下代码。 在学好真好! 可以试着把 randidelTime 改得很大,网络开销会好很多。

    public static int randidelTime() {
        int ms = 10000 + r.nextInt(90000); // 10s ~ 100s
        return ms;
    }

可以先试一下单机,这种方式最简单了。

客户端移走,稍微改一下代码 ConcurrencyBench 就可。 单IP可以到6万连接。 我没有资源做这种测试,所以也很感兴趣这个结果,望能分享。

shenfeng avatar Mar 30 '13 05:03 shenfeng

那是一定,这边机器闲着也是浪费,有什么测试需求尽管说。

那我先测测单机。

Ranler avatar Mar 30 '13 05:03 Ranler

现在又碰到一个瓶颈,大概并发量在100万左右,就开始Connection reset by peer

JDK换成了:

$ java -version
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

服务端: queue-size:204800,309600,影像不大。

客户端:

  • randidelTime中随机90000(90+万),120000,150000,200000,270000,300000(110-万),并发量略有提高(几万左右),不是很明显。
  • Thread.sleep试过20,30,40,最大并发量几乎没变化。

Ranler avatar Mar 30 '13 11:03 Ranler

100万应该不是 系统的极限,我曾经达到过160万 (需要revert这个改动https://github.com/shenfeng/dictionary/commit/0b741e8a7bda7cee2c240281ef7f714672cf5278 )

http://shenfeng.me/how-far-epoll-can-push-concurrent-socket-connection.html

你可以从这几个方面看一下:

  • jvisualvm 看那个进程是不是被gc累坏了,试着调整下JVM的参数,run_server 里面,一般就加大内存试验一下
  • htop或者其它程序监控系统的CPU和内存使用情况。 看看这个时候机器是不是被累坏了

Connection reset by peer 我也有遇到过,不是特别清楚为什么,可能会是由于没有资源了。 看看直接用浏览器访问,看server还能不能响应,和latency。

shenfeng avatar Mar 30 '13 11:03 shenfeng

shenfeng, 抱歉,最近没摸机器。今天继续测了一下,结果如下:

  1. 首先,还是上次的测试参数,查看GC: jvm_100_1

可以看到峰值时老年代已经满了,这时并发100万左右频繁报异常。这应该是JVM内存不够的原因。

  1. 把JVM设为-xmx6144m -xms6144m,再次测试: jvm_150_1

这次老年代没有满,并发达到了150万左右开始报异常。可以看到有个一直GC繁忙的时间段。 这个时间段就一直抱异常。

下图是host的情况,CPU总体并不繁忙,大概是大部分时间在IO等待。内存已被消耗殆尽,但是test的进程设了-xmx4096m,server进程设了-xmx6144m,这加起来才10GB,但是host用了快30GB,也许是因为Java NIO分配了大量JVM堆外内存的结果(或者是kernel管理大量链接所需的内存?) TM 20130404111921

如你之前所说,http-kit的最大并发量仍是一个多方面的原因。

Ranler avatar Apr 04 '13 03:04 Ranler

还有,关于GC采用哪种方式有没有什么推荐?

Ranler avatar Apr 04 '13 04:04 Ranler

也许是因为Java NIO分配了大量JVM堆外内存的结果(或者是kernel管理大量链接所需的内存?)

http-kit 只用了64k的 堆外内存(所有的共享这一个)。估计原因是TCP 的read/write buffer耗掉了所有的内存。可以考虑google一下,然后设置得小一点。 默认可能是8k左右,调整到2k或者4k,能double这个数字。

维护一个连接,http-kit 需要大概2k内存,150万 x 2k = 3G, 所以可以设置JVM的内存为4G左右,如果150万。

还有,关于GC采用哪种方式有没有什么推荐?

默认的可能ok。对这个的配置也不熟悉

shenfeng avatar Apr 04 '13 10:04 shenfeng