gradio
gradio copied to clipboard
Test Performance of the New Queue after Redesign
- [X] I have searched to see if a similar issue already exists.
Is your feature request related to a problem? Please describe.
Since we are redesigning the queue from scratch, it would be great to test the performance of the new Queue with a beta release. Thanks @freddyaboulton for the idea.
We should test identical spaces with different Gradio versions. Related #572, #1337 Any further suggestions would be most welcome.
Describe the solution you'd like
I think we should test(the list is from high priority to low priority)
The performance
- tested manually like a standard user
- of the first event
- when there are many many short events
- when there are few long events
- when there are many many short events with concurrency parameter
- when there are few long events with concurrency parameter
I think we should test it from the browser as it will be easier since we use Websockets, could we use playwright for sending few or many number of events @dawoodkhan82, @pngwn?
Additional context
@pngwn had some concerns regarding opening a new ws connection for each request, would like to address that concern as well, if possible.
This library can be used for benchmarking creation of many number of websocket connections. https://github.com/observing/thor
Tried thor out, this is the results for 100 number of connections, not sure why but there were failing connections when I used 200+ connections for benchmarking. Could be related to OS, Thor or Fastapi. Need to test it with a different tool in the future.
It seems in creation of a ws connection, handshaking is ~280 ms and latency is ~10 ms in average.
➜ gradio git:(queue-refactor-backend) ✗ thor --amount 100 ws://127.0.0.1:7860/queue/join
Thor: version: 1.0.0
God of Thunder, son of Odin and smasher of WebSockets!
Thou shall:
- Spawn 16 workers.
- Create all the concurrent/parallel connections.
- Smash 100 connections with the mighty Mjölnir.
The answers you seek shall be yours, once I claim what is mine.
Connecting to ws://127.0.0.1:7860/queue/join
Online 807 milliseconds
Time taken 836 milliseconds
Connected 100
Disconnected 0
Failed 0
Total transferred 25.29kB
Total received 29.59kB
Durations (ms):
min mean stddev median max
Handshaking 24 280 207 308 619
Latency 0 10 13 8 112
Percentile (ms):
50% 66% 75% 80% 90% 95% 98% 98% 100%
Handshaking 308 400 469 505 576 605 616 619 619
Latency 8 11 14 15 21 27 46 112 112
100 seems like a very low number, we need to load test with 1000s at the least.
100 and 1000 connection test on HF spaces.
(venv) hugginface-5g@Omers-MacBook-Pro gradio % thor --amount 100 wss://spaces.huggingface.tech/farukozderim/test-new-queue/queue/join
Thor: version: 1.0.0
God of Thunder, son of Odin and smasher of WebSockets!
Thou shall:
- Spawn 8 workers.
- Create all the concurrent/parallel connections.
- Smash 100 connections with the mighty Mjölnir.
The answers you seek shall be yours, once I claim what is mine.
Connecting to wss://spaces.huggingface.tech/farukozderim/test-new-queue/queue/join
Online 859 milliseconds
Time taken 1428 milliseconds
Connected 100
Disconnected 0
Failed 0
Total transferred 28.91kB
Total received 29.69kB
Durations (ms):
min mean stddev median max
Handshaking 313 514 100 566 667
Latency NaN NaN NaN NaN NaN
Percentile (ms):
50% 66% 75% 80% 90% 95% 98% 98% 100%
Handshaking 566 583 587 593 624 635 653 667 667
Latency NaN NaN NaN NaN NaN NaN NaN NaN NaN
(venv) hugginface-5g@Omers-MacBook-Pro gradio % ^[[A^[[A^[[A^C
(venv) hugginface-5g@Omers-MacBook-Pro gradio % thor --amount 1000 wss://spaces.huggingface.tech/farukozderim/test-new-queue/queue/join
Thor: version: 1.0.0
God of Thunder, son of Odin and smasher of WebSockets!
Thou shall:
- Spawn 8 workers.
- Create all the concurrent/parallel connections.
- Smash 1000 connections with the mighty Mjölnir.
The answers you seek shall be yours, once I claim what is mine.
Connecting to wss://spaces.huggingface.tech/farukozderim/test-new-queue/queue/join
Online 9362 milliseconds
Time taken 9410 milliseconds
Connected 1000
Disconnected 0
Failed 0
Total transferred 289.06kB
Total received 296.88kB
Durations (ms):
min mean stddev median max
Handshaking 779 4296 1891 4304 8980
Latency NaN NaN NaN NaN NaN
Percentile (ms):
50% 66% 75% 80% 90% 95% 98% 98% 100%
Handshaking 4304 5107 5532 5742 6347 7926 8857 8958 8980
Latency NaN NaN NaN NaN NaN NaN NaN NaN NaN
I tested connecting 8000 clients to a HF Space, however my 8GB ram went full with this many clients. There were also OSerrors popping out, there might be a file descriptor limit to opening ws connections, might be necessary to remove that as well. Tested the space from mobile since my ram was full, and the space successfully kicked 8000 dced clients from the queue and moved me to 1st rank, though it took some iterations/seconds to do that. Also the ETA precision is +%90 but it is not exact.




You can replicate the test with this file, had to do some workarounds since it was not wholly supporting ws communication and there was no docs available. So right now it does not make benchmarks but keeps x number of connection always open, and whenever a ws-event's process is completed it restarts it.
$ locust
locustfile.py
import time
import json
from locust import task
from locust_plugins.users import SocketIOUser
class MySocketIOUser(SocketIOUser):
@task
def my_task(self):
self.my_value = None
#self.connect("ws://localhost:7860/queue/join")
self.connect("wss://spaces.huggingface.tech/farukozderim/test-new-queue/queue/join")
# example of subscribe
import random
hashe = random.getrandbits(1024)
msg = '{"hash": "' + str(hashe) + '"}'
json.loads(msg)
self.ws.send(msg)
# wait until I get a push message to on_message
while True:
while not self.my_value:
time.sleep(0.1)
msg = self.my_value["msg"]
if msg == "send_data":
self.ws.send('{"data": ["text2"], "fn": 0}')
elif msg == "estimation":
pass
elif msg == "process_starts":
pass
elif msg == "process_completed":
print(self.my_value, time.time())
return True
self.my_value = None
# wait for additional pushes, while occasionally sending heartbeats, like a real client would
#self.sleep_with_heartbeat(10)
def on_message(self, message):
self.my_value = json.loads(message)
if __name__ == "__main__":
host = "ws://localhost:7861/queue/test"
I thought we could test the new queue with python better, and had found a way to send json messages as texts in locust.
Wanted to try it, even created a space for load-testing another space. :) cc: @aliabid94 @abidlabs
https://huggingface.co/spaces/farukozderim/load-test-queue
Could load the queue with 23k clients. Though throughput starts to decrease a lot with high queue sizes due to communication overheads. One needs to use sparse status_update_rate for these kind of demos or cap the queue.
#!/usr/bin/env python
# usage
# Spaces: python3 test_ws.py 100
# Local: python3 test_ws.py 100 7860
# There are file descriptor limits that I could not pass, 1000 client does not work well.
import asyncio
import json
import random
import time
import websockets
import sys
client_count=int(sys.argv[1])
if len(sys.argv) == 3:
print("Testing on localhost")
port = sys.argv[2]
host = f"ws://localhost:{port}/queue/join"
else:
host = "wss://spaces.huggingface.tech/farukozderim/test-new-queue/queue/join"
duration_list = []
async def startup(client_count):
await asyncio.gather(*[client(i) for i in range(client_count)])
async def client(rank):
await asyncio.sleep(rank*0.01) # Server cannot handle a lot of instantanous conns
async with websockets.connect(host, timeout=10000) as websocket:
start = time.time()
while True:
raw_msg = await websocket.recv()
jso = json.loads(raw_msg)
msg = jso["msg"]
if msg == "send_data":
await websocket.send('{"data": ["text2"]}')
elif msg == "estimation":
#print(jso)
pass
elif msg == "process_starts":
pass
elif msg == "process_completed":
end = time.time()
duration = end - start
if jso["success"]:
duration_list.append([duration, start, end])
return
begin = time.time()
out = asyncio.run(startup(client_count))
end = time.time()
print(f"Total-duration: {round(end-begin,3)}, success: {len(duration_list)} out of {client_count}")
#print(duration_list)
I feel like this is more or less completed, no need to keep it open anymore.