gradio Test Performance of the New Queue after Redesign

[X] I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
Since we are redesigning the queue from scratch, it would be great to test the performance of the new Queue with a beta release. Thanks @freddyaboulton for the idea. We should test identical spaces with different Gradio versions. Related #572, #1337 Any further suggestions would be most welcome.

Describe the solution you'd like
I think we should test(the list is from high priority to low priority) The performance

tested manually like a standard user
of the first event
when there are many many short events
when there are few long events
when there are many many short events with concurrency parameter
when there are few long events with concurrency parameter

I think we should test it from the browser as it will be easier since we use Websockets, could we use playwright for sending few or many number of events @dawoodkhan82, @pngwn?

Additional context
@pngwn had some concerns regarding opening a new ws connection for each request, would like to address that concern as well, if possible.

Jul 18 '22 09:07 omerXfaruq

This library can be used for benchmarking creation of many number of websocket connections. https://github.com/observing/thor

Jul 18 '22 15:07 omerXfaruq

Tried thor out, this is the results for 100 number of connections, not sure why but there were failing connections when I used 200+ connections for benchmarking. Could be related to OS, Thor or Fastapi. Need to test it with a different tool in the future.

It seems in creation of a ws connection, handshaking is ~280 ms and latency is ~10 ms in average.

➜  gradio git:(queue-refactor-backend) ✗ thor --amount 100 ws://127.0.0.1:7860/queue/join

Thor:                                                  version: 1.0.0

God of Thunder, son of Odin and smasher of WebSockets!

Thou shall:
- Spawn 16 workers.
- Create all the concurrent/parallel connections.
- Smash 100 connections with the mighty Mjölnir.

The answers you seek shall be yours, once I claim what is mine.

Connecting to ws://127.0.0.1:7860/queue/join



Online               807 milliseconds
Time taken           836 milliseconds
Connected            100
Disconnected         0
Failed               0
Total transferred    25.29kB
Total received       29.59kB

Durations (ms):

                     min     mean     stddev  median max
Handshaking          24      280         207     308 619
Latency              0       10           13       8 112

Percentile (ms):

                      50%     66%     75%     80%     90%     95%     98%     98%    100%
Handshaking          308     400     469     505     576     605     616     619     619
Latency              8       11      14      15      21      27      46      112     112

Jul 18 '22 16:07 omerXfaruq

100 seems like a very low number, we need to load test with 1000s at the least.

Jul 22 '22 11:07 pngwn

100 and 1000 connection test on HF spaces.

(venv) hugginface-5g@Omers-MacBook-Pro gradio % thor --amount 100 wss://spaces.huggingface.tech/farukozderim/test-new-queue/queue/join
Thor:                                                  version: 1.0.0

God of Thunder, son of Odin and smasher of WebSockets!

Thou shall:
- Spawn 8 workers.
- Create all the concurrent/parallel connections.
- Smash 100 connections with the mighty Mjölnir.

The answers you seek shall be yours, once I claim what is mine.

Connecting to wss://spaces.huggingface.tech/farukozderim/test-new-queue/queue/join


                     
Online               859 milliseconds
Time taken           1428 milliseconds
Connected            100
Disconnected         0
Failed               0
Total transferred    28.91kB
Total received       29.69kB

Durations (ms):

                     min     mean     stddev  median max    
Handshaking          313     514         100     566 667    
Latency              NaN     NaN         NaN     NaN NaN    

Percentile (ms):

                      50%     66%     75%     80%     90%     95%     98%     98%    100%   
Handshaking          566     583     587     593     624     635     653     667     667    
Latency              NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN    
(venv) hugginface-5g@Omers-MacBook-Pro gradio % ^[[A^[[A^[[A^C
(venv) hugginface-5g@Omers-MacBook-Pro gradio % thor --amount 1000 wss://spaces.huggingface.tech/farukozderim/test-new-queue/queue/join

Thor:                                                  version: 1.0.0

God of Thunder, son of Odin and smasher of WebSockets!

Thou shall:
- Spawn 8 workers.
- Create all the concurrent/parallel connections.
- Smash 1000 connections with the mighty Mjölnir.

The answers you seek shall be yours, once I claim what is mine.

Connecting to wss://spaces.huggingface.tech/farukozderim/test-new-queue/queue/join


                     
Online               9362 milliseconds
Time taken           9410 milliseconds
Connected            1000
Disconnected         0
Failed               0
Total transferred    289.06kB
Total received       296.88kB

Durations (ms):

                     min     mean     stddev  median max    
Handshaking          779     4296       1891    4304 8980   
Latency              NaN     NaN         NaN     NaN NaN    

Percentile (ms):

                      50%     66%     75%     80%     90%     95%     98%     98%    100%   
Handshaking          4304    5107    5532    5742    6347    7926    8857    8958    8980   
Latency              NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN

Aug 15 '22 19:08 omerXfaruq

I tested connecting 8000 clients to a HF Space, however my 8GB ram went full with this many clients. There were also OSerrors popping out, there might be a file descriptor limit to opening ws connections, might be necessary to remove that as well. Tested the space from mobile since my ram was full, and the space successfully kicked 8000 dced clients from the queue and moved me to 1st rank, though it took some iterations/seconds to do that. Also the ETA precision is +%90 but it is not exact.

Aug 17 '22 16:08 omerXfaruq

You can replicate the test with this file, had to do some workarounds since it was not wholly supporting ws communication and there was no docs available. So right now it does not make benchmarks but keeps x number of connection always open, and whenever a ws-event's process is completed it restarts it.

$ locust

locustfile.py

import time
import json
from locust import task
from locust_plugins.users import SocketIOUser


class MySocketIOUser(SocketIOUser):
  @task
  def my_task(self):
    self.my_value = None

    #self.connect("ws://localhost:7860/queue/join")
    self.connect("wss://spaces.huggingface.tech/farukozderim/test-new-queue/queue/join")
    # example of subscribe
    import random
    hashe = random.getrandbits(1024)
    msg = '{"hash": "' + str(hashe) + '"}'
    json.loads(msg)
    self.ws.send(msg)
    # wait until I get a push message to on_message
    while True:
      while not self.my_value:
        time.sleep(0.1)
      msg = self.my_value["msg"]
      if msg == "send_data":
        self.ws.send('{"data": ["text2"], "fn": 0}')
      elif msg == "estimation":
        pass
      elif msg == "process_starts":
        pass
      elif msg == "process_completed":
        print(self.my_value, time.time())
        return True

      self.my_value = None

    # wait for additional pushes, while occasionally sending heartbeats, like a real client would
    #self.sleep_with_heartbeat(10)

  def on_message(self, message):
    self.my_value = json.loads(message)

  if __name__ == "__main__":
    host = "ws://localhost:7861/queue/test"

Aug 17 '22 16:08 omerXfaruq

I thought we could test the new queue with python better, and had found a way to send json messages as texts in locust.

Wanted to try it, even created a space for load-testing another space. :) cc: @aliabid94 @abidlabs

https://huggingface.co/spaces/farukozderim/load-test-queue

Could load the queue with 23k clients. Though throughput starts to decrease a lot with high queue sizes due to communication overheads. One needs to use sparse status_update_rate for these kind of demos or cap the queue.


#!/usr/bin/env python
# usage
# Spaces: python3 test_ws.py 100
# Local:  python3 test_ws.py 100 7860
# There are file descriptor limits that I could not pass, 1000 client does not work well. 

import asyncio
import json
import random
import time
import websockets
import sys

client_count=int(sys.argv[1])

if len(sys.argv) == 3:
    print("Testing on localhost")
    port = sys.argv[2]
    host = f"ws://localhost:{port}/queue/join"
else:
    host = "wss://spaces.huggingface.tech/farukozderim/test-new-queue/queue/join"

duration_list = []

async def startup(client_count):
    await asyncio.gather(*[client(i) for i in range(client_count)])    

async def client(rank):
    await asyncio.sleep(rank*0.01) # Server cannot handle a lot of instantanous conns
    async with websockets.connect(host, timeout=10000) as websocket:
        start = time.time()
        while True:
            raw_msg = await websocket.recv()
            jso = json.loads(raw_msg)
            msg = jso["msg"]
            if msg == "send_data":
                await websocket.send('{"data": ["text2"]}')
            elif msg == "estimation":
                #print(jso)
                pass
            elif msg == "process_starts":
                pass
            elif msg == "process_completed":
                end = time.time()
                duration = end - start
                if jso["success"]:
                    duration_list.append([duration, start, end])
                return

begin = time.time()
out = asyncio.run(startup(client_count))
end = time.time()

print(f"Total-duration: {round(end-begin,3)}, success: {len(duration_list)} out of {client_count}")
#print(duration_list)

Sep 02 '22 11:09 omerXfaruq

I feel like this is more or less completed, no need to keep it open anymore.

Oct 01 '22 00:10 omerXfaruq

gradio gradio copied to clipboard

Test Performance of the New Queue after Redesign

gradio
gradio copied to clipboard