dstack icon indicating copy to clipboard operation
dstack copied to clipboard

Add replica groups in dstack-service

Open Bihan opened this issue 1 week ago • 11 comments

Steps To Test

Step1: Create replica-groups-service.yml

# replica-groups-service.yml
type: service
name: replica-groups-test
python: 3.12

replica_groups:
  - name: replica-1
    replicas: 0..2
    scaling:
      metric: rps
      target: 2
    commands:
      - echo "Group 1 - Version 0" > /tmp/version.txt
      - python3 -m http.server 8000
    resources:
      cpu: 2

  - name: replica-2
    replicas: 0..3
    scaling:
      metric: rps
      target: 2
    commands:
      - echo "Group 2 - Version 0" > /tmp/version.txt
      - python3 -m http.server 8000
    resources:
      cpu: 2

port: 8000

Step2: dstack apply -f replica-groups-service.yml

Step3: Run load_test_replica_groups.py by subsituting your URL and TOKEN

import asyncio
import aiohttp
import time

# ==== Configuration ====
URL = "<URL>"
TOKEN = "<TOKEN>"
RPS = 8          # Requests per second
DURATION = 1800       # Duration in seconds
METHOD = "GET"     # or "POST"
# =======================

HEADERS = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {TOKEN}"
}


async def send_request(session, idx):
    """Send a request and print response"""
    try:
        async with session.request(METHOD, URL, headers=HEADERS) as resp:
            text = await resp.text()
            print(f"\n[{idx}] Status: {resp.status}")
            # print small part of response (HTML preview)
            print(text[:200].strip(), "...\n")
    except Exception as e:
        print(f"[{idx}] Error: {e}")


async def run_load_test():
    total_requests = RPS * DURATION
    interval = 1.0 / RPS

    async with aiohttp.ClientSession() as session:
        start_time = time.perf_counter()
        tasks = []

        for i in range(total_requests):
            task = asyncio.create_task(send_request(session, i + 1))
            tasks.append(task)
            await asyncio.sleep(interval)

        await asyncio.gather(*tasks)
        elapsed = time.perf_counter() - start_time
        print(f"\n✅ Sent {total_requests} requests in {elapsed:.2f}s "
              f"(~{total_requests/elapsed:.2f} RPS)")


if __name__ == "__main__":
    asyncio.run(run_load_test())

Expected Output Each group gets one replica

Submit the run replica-groups-test? [y/n]: y
 NAME                  BACKEND          GPU  PRICE    STATUS   SUBMITTED 
 replica-groups-test                    -    -        running  07:31     
    group=0 replica=0  aws (us-east-2)  -    $0.0832  running  07:32     
    group=1 replica=1  aws (us-east-2)  -    $0.0832  running  07:32

Later, both groups scale respecting group configs. group0 scales to 2 replicas, and group1 scales to 3.

Below is the expected output

NAME                  BACKEND          GPU  PRICE    STATUS   SUBMITTED  
 replica-groups-test                    -    -        running  9 mins ago 
    group=0 replica=0  aws (us-east-2)  -    $0.0832  running  8 mins ago 
            replica=2  aws (us-east-2)  -    $0.0832  running  3 mins ago 
    group=1 replica=1  aws (us-east-2)  -    $0.0832  running  8 mins ago 
            replica=3  aws (us-east-2)  -    $0.0832  running  3 mins ago 
            replica=4  aws (us-east-2)  -    $0.0832  running  3 mins ago

Step4: Check whether replica specific commands were executed. Attach to the desired replica Eg: dstack attach -replica 2 replica-groups-test ssh replica-groups-test-0-2 'cat /tmp/version.txt' output: Group 1 - Version 0

Step5: Check rolling deployment. Important: Rolling deployments are currently affected by a race condition that also impacts the non–replica group implementation and must be addressed separately (issue). However, when each replica group is configured with a single replica, this race condition does not affect rolling deployments.

Testing instructions:

Scale down each replica group to 1 replica.

Restart the load-testing script with RPS = 2.

After all groups have scaled down to a single replica, re-apply the configuration:

Re-apply dstack apply -f replica-groups-service.yml

Active run replica-groups-test already exists. Detected changes that can be updated in-place:
- Configuration properties:
  - replica_groups

Update the run? [y/n]: y
 NAME                  BACKEND          GPU  PRICE    STATUS      SUBMITTED 
 replica-groups-test                    -    -        running     07:51     
    group=0 replica=0  aws (us-east-2)  -    $0.0832  terminated  07:51     
            replica=2  aws (us-east-2)  -    $0.0832  running     07:53     
    group=1 replica=1  aws (us-east-2)  -    $0.0832  terminated  07:51     
            replica=3  aws (us-east-2)  -    $0.0832  running     07:53     

Bihan avatar Dec 20 '25 03:12 Bihan