Nitin Kedia
Nitin Kedia
Hi @lhpp1314, the `Supported Models` section in the project [README](https://github.com/microsoft/vidur/tree/main?tab=readme-ov-file#supported-models) has the updated link for profiling. Please use the central readme to find links to other documentation.
Hi @JasonZhang517, It is certainly possible to export the actual memory used by the requests instead of the reserved memory. For scheduling algorithms which do not use dynamic memory allocation...
@AgrawalAmey I believe the question is about amount of KV cache memory which is being used for a token and not just reserved.
Hi @EheinWang and @ozcanmiraay, we have documentation on how to run Vidur Search aka config explorer at [docs/config_explorer.md](https://github.com/microsoft/vidur/blob/main/docs/config_explorer.md). Please check it out.
@ozcanmiraay, for models other than LLama3 ones, `scheduler_config_batch_size_cap = 128` and `request_length_generator_config_max_tokens = 4096` are the maximum. For llama3, the maximums are 512 and 16k respectively. Some more details regarding...
Hi @spliii, create a new class for the model you want to add at [model_config.py](https://github.com/microsoft/vidur/blob/main/vidur/config/model_config.py). Earlier we used to use yaml files for config but we shifted to using dataclasses...
Hi @rajeshitshoulders @akaashrp and @ozcanmiraay, Can you please try removing `jupyterlab` dependency from `environment.yml`. You'll need to create a new mamba environment with the changed `environment.yml` file. `IPython` is used...
Hi @rajeshitshoulders @vladandrew @Yogaht @akaashrp, Vidur now uses `seaborn` (based on `matplotlib`) instead of `plotly` (which uses `kaleido`). This removes dependency on `kaliedo` and hence the original error should not...
Hi @yl3469 , I spent quite a bit of time with this PR because the core issue it aims to solve (of extrapolating to say higher context length) makes Vidur...
@sicario001 Please look at the comments I have left and also accept the Contributor License Agreement (without which the PR cannot be merged).