Bihan Rana issues

Results 16 issues of


                                            Bihan Rana

Only 8 of 64 GPUs Are Fully Partitioned and Usable in Docker After CPX/NPS4

**Summary** After setting compute partition to `CPX` and memory partition to `NPS4`, only 8 GPUs **(indices 0, 8, 16, 24, 32, 40, 48, 56)** show valid COMPUTE_PARTITION: CPX and MEMORY_PARTITION:...

Under Investigation

Add replica groups in dstack-service

**Steps To Test** Step1: Create `replica-groups-service.yml` ``` # replica-groups-service.yml type: service name: replica-groups-test python: 3.12 replica_groups: - name: replica-1 replicas: 0..2 scaling: metric: rps target: 2 commands: - echo "Group...

[Bug]: Service re-run terminates despite available fleet capacity.

### Steps to reproduce Configs: ``` # my_cpu_fleet.yml type: fleet name: cpu-default nodes: 0..8 resources: cpu: 2 ``` ``` # simple-service-replicas.yml type: service name: simple-service-replicas https: false python: 3.12 commands:...

bug

major

[Meta]: Native Inference

This tracks the roadmap for implementing native inference capabilities inside dstack. Currently LLM inference systems (SGLang, Dynamo, Grove, LLM-d, Ai-brix, SGLang OME) revolve around inference-native concepts: TTFT/ITL autoscaling, PD disaggregation,...

[Feature]: TTFT/ITL Based Autoscaling

### Problem Time To First Token (TTFT) and Inter-Token Latency (ITL) directly reflect user experience: **TTFT:** Time until the first token appears (responsiveness) **ITL:** Time between subsequent tokens (generation speed)...

feature

inference

[Feature]: Support Variable Interpolation in env: configuration

### Problem Currently the` env:` configuration does not support variable interpolation. This means that when we define environment variables like: ``` env: - NUM_SHARD=$DSTACK_GPUS_NUM ``` The value is not evaluated...

feature

no-stale