vidur
vidur copied to clipboard
A large-scale simulation framework for LLM inference
Since 4090 really has an edge on inferencing with smaller models - is it possible to add data for 4090 cards? Thanks!
I am trying to stydy the scheduling algorithms used in the simulator, some scheduling algorithms like Orca just reserve the maximum memory for requests but they are not actually used....
The paper mentions that Vidur-Search can be used to find the best deployment method, but the README does not mention how to use it.
Found a cpu example and want to test this. It needs sarathi, but sarathi that I found haven't implementation for LLMEngine. What're requirements for cpu usage
Has the Splitwise branch been fully updated? I noticed some logic changes in the SplitwiseGlobalScheduler, but it seems that there are no adjustments made for PD separation in other areas,...
Hi, could you please help with resolve below issue for IPython.core.display module Setup mamba virtual env: /home/idps/vidur/vidur-venv I configured wandb and set to variable WANDB_BASE_URL to local web server with...
Hello! There is a statement in the README file: "The simulator supports a plethora of parameters for the simulation description which can be found [here](https://github.com/microsoft/vidur/blob/main/docs/launch_parameters.md)." However, the link doesn't work:...
I found the` _get_block_execution_time()` function in the `vidur/entities/execution_time.py` path computes 'add_time' only once. ``` def _get_block_execution_time(self) -> float: return ( self._get_attention_layer_execution_time() + self._get_mlp_layer_execution_time() + self._add_time ) ``` But in other...
The below mentioned point where the link for GPTModel is broken and can't get the exact config for the yaml file hence not able to add new model (llama2-13b) as...