Add replica groups in dstack-service
Steps To Test
Step1: Create replica-groups-service.yml
# replica-groups-service.yml
type: service
name: replica-groups-test
python: 3.12
replica_groups:
- name: replica-1
replicas: 0..2
scaling:
metric: rps
target: 2
commands:
- echo "Group 1 - Version 0" > /tmp/version.txt
- python3 -m http.server 8000
resources:
cpu: 2
- name: replica-2
replicas: 0..3
scaling:
metric: rps
target: 2
commands:
- echo "Group 2 - Version 0" > /tmp/version.txt
- python3 -m http.server 8000
resources:
cpu: 2
port: 8000
Step2: dstack apply -f replica-groups-service.yml
Step3: Run load_test_replica_groups.py by subsituting your URL and TOKEN
import asyncio
import aiohttp
import time
# ==== Configuration ====
URL = "<URL>"
TOKEN = "<TOKEN>"
RPS = 8 # Requests per second
DURATION = 1800 # Duration in seconds
METHOD = "GET" # or "POST"
# =======================
HEADERS = {
"Content-Type": "application/json",
"Authorization": f"Bearer {TOKEN}"
}
async def send_request(session, idx):
"""Send a request and print response"""
try:
async with session.request(METHOD, URL, headers=HEADERS) as resp:
text = await resp.text()
print(f"\n[{idx}] Status: {resp.status}")
# print small part of response (HTML preview)
print(text[:200].strip(), "...\n")
except Exception as e:
print(f"[{idx}] Error: {e}")
async def run_load_test():
total_requests = RPS * DURATION
interval = 1.0 / RPS
async with aiohttp.ClientSession() as session:
start_time = time.perf_counter()
tasks = []
for i in range(total_requests):
task = asyncio.create_task(send_request(session, i + 1))
tasks.append(task)
await asyncio.sleep(interval)
await asyncio.gather(*tasks)
elapsed = time.perf_counter() - start_time
print(f"\n✅ Sent {total_requests} requests in {elapsed:.2f}s "
f"(~{total_requests/elapsed:.2f} RPS)")
if __name__ == "__main__":
asyncio.run(run_load_test())
Expected Output Each group gets one replica
Submit the run replica-groups-test? [y/n]: y
NAME BACKEND GPU PRICE STATUS SUBMITTED
replica-groups-test - - running 07:31
group=0 replica=0 aws (us-east-2) - $0.0832 running 07:32
group=1 replica=1 aws (us-east-2) - $0.0832 running 07:32
Later, both groups scale respecting group configs. group0 scales to 2 replicas, and group1 scales to 3.
Below is the expected output
NAME BACKEND GPU PRICE STATUS SUBMITTED
replica-groups-test - - running 9 mins ago
group=0 replica=0 aws (us-east-2) - $0.0832 running 8 mins ago
replica=2 aws (us-east-2) - $0.0832 running 3 mins ago
group=1 replica=1 aws (us-east-2) - $0.0832 running 8 mins ago
replica=3 aws (us-east-2) - $0.0832 running 3 mins ago
replica=4 aws (us-east-2) - $0.0832 running 3 mins ago
Step4: Check whether replica specific commands were executed.
Attach to the desired replica
Eg:
dstack attach -replica 2 replica-groups-test
ssh replica-groups-test-0-2 'cat /tmp/version.txt'
output: Group 1 - Version 0
Step5: Check rolling deployment. Important: Rolling deployments are currently affected by a race condition that also impacts the non–replica group implementation and must be addressed separately (issue). However, when each replica group is configured with a single replica, this race condition does not affect rolling deployments.
Testing instructions:
Scale down each replica group to 1 replica.
Restart the load-testing script with RPS = 2.
After all groups have scaled down to a single replica, re-apply the configuration:
Re-apply
dstack apply -f replica-groups-service.yml
Active run replica-groups-test already exists. Detected changes that can be updated in-place:
- Configuration properties:
- replica_groups
Update the run? [y/n]: y
NAME BACKEND GPU PRICE STATUS SUBMITTED
replica-groups-test - - running 07:51
group=0 replica=0 aws (us-east-2) - $0.0832 terminated 07:51
replica=2 aws (us-east-2) - $0.0832 running 07:53
group=1 replica=1 aws (us-east-2) - $0.0832 terminated 07:51
replica=3 aws (us-east-2) - $0.0832 running 07:53