foundation-model-benchmarking-tool
foundation-model-benchmarking-tool copied to clipboard
Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.
To remove the config_filepath.txt as a config reference for developer workflow since we have to change it every time in addition to the debug.sh file. Either to add clearer instructions...
Bumps [litellm](https://github.com/BerriAI/litellm) from 1.34.0 to 1.40.29. Release notes Sourced from litellm's releases. v1.40.29 What's Changed Updates Databricks provider docs by @djliden in BerriAI/litellm#4442 [Feat] Improve secret detection call hook -...
Bumps [requests](https://github.com/psf/requests) from 2.31.0 to 2.32.2. Release notes Sourced from requests's releases. v2.32.2 2.32.2 (2024-05-21) Deprecations To provide a more stable migration for custom HTTPAdapters impacted by the CVE changes...
Bumps [litellm](https://github.com/BerriAI/litellm) from 1.34.0 to 1.40.0. Release notes Sourced from litellm's releases. v1.40.0 What's Changed fix: fix streaming with httpx client by @krrishdholakia in BerriAI/litellm#3944 feat(scheduler.py): add request prioritization scheduler...
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.7 to 2.2.2. Release notes Sourced from urllib3's releases. 2.2.2 🚀 urllib3 is fundraising for HTTP/2 support urllib3 is raising ~$40,000 USD to release HTTP/2 support and...
Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.4 to 6.4.1. Changelog Sourced from tornado's changelog. Release notes .. toctree:: :maxdepth: 2 releases/v6.4.1 releases/v6.4.0 releases/v6.3.3 releases/v6.3.2 releases/v6.3.1 releases/v6.3.0 releases/v6.2.0 releases/v6.1.0 releases/v6.0.4 releases/v6.0.3 releases/v6.0.2 releases/v6.0.1 releases/v6.0.0...
As discussed [here on StackOverflow](https://stackoverflow.com/a/44938662): 1. **Applications** should generally lock dependencies to exact versions, for reliable deployment 2. **Libraries** should generally support broad dependency version ranges where practical, to accommodate...
In this configuration, I wanted to compare the performance of two different hosting images (LMI and TGI) on the same endpoint. (I added the g5 just to get past the...
This PR contains the config file update and prompt template update for summarization use cases. This prompt template is specifically tested on large synthetic data that is meant to be...
the benchmark of model latency highly depends on model warmup. There is no "warmup" phase in 3_run_inference.ipynb.