vidur icon indicating copy to clipboard operation
vidur copied to clipboard

A large-scale simulation framework for LLM inference

Results 34 vidur issues
Sort by recently updated
recently updated
newest added

**Issue** `MetricsStore` uses 1-indexing for `replica_id` at several places even though `replica_id`s are 0-indexed. **Fix** Update 1-indexed usage of `replica_id` in `_replica_memory_usage`, `_replica_busy_time` and `_replica_mfu` inside `MetricsStore` to 0-indexed.

Hi Vidur Team! I am interested the profiling phase but i found something that confusing me。 In `vidur/execution_time_predictor/sklearn_execution_time_predictor.py:494` ``` if self._replica_config.num_pipeline_stages > 1: send_recv_df = self._load_send_recv_df(self._send_recv_input_file) send_recv_df = self._get_send_recv_df_with_derived_features(send_recv_df) models["send_recv"]...

Hello Vidur, Thank you for sharing your work. While reading the code, I encountered a question. I am analyzing the profiling part of the code. The profiling is divided into...

The current default random forest predictor is overfitting on the training set and is not able to generalize the batch execution time prediction to unseen token numbers or batch sizes....

Hi team, I’m working on adapting Vidur, the LLM inference system simulator, to vLLM. Currently, Vidur’s profiling is based on Sarathi-Serve, but I’d like to explore how to make it...

Hi, could you please help with resolve below issue i have already install the serathi-serve, but still meet this error # python vidur/profiling/attention/main.py \ --models codellama/CodeLlama-34b-Instruct-hf \ --num_gpus 4 Traceback...

python /app/software1/vidur/vidur/profiling/collectives/main.py \ --num_workers_per_node_combinations 2 \ --collective send_recv 2025-02-14 07:59:21,469 INFO worker.py:1821 -- Started a local Ray instance. 0%| | 0/994 [00:00

$python -m vidur.main \ > --replica_config_device a100 \ > --replica_config_model_name meta-llama/Meta-Llama-3-8B \ > --cluster_config_num_replicas 1 \ > --replica_config_tensor_parallel_size 1 \ > --replica_config_num_pipeline_stages 1 \ > --request_generator_config_type synthetic \ > --synthetic_request_generator_config_num_requests...

I am studying how Vidur determines bottlenecks, and I noticed that both TFFTViolationLowMaxBatchSizeCase and TFFTViolationLowMemoryCase involve checking batch_size_obs. It seems that batch_size_obs should be a specific batch size value, but...