Jack Butler
Jack Butler
Fixes #990 I've just refactored out the tokenizer configuration details from the `get_tokenizer` function. This should hopefully; * Make it easier to add new tokenizers without changing logic in `get_tokenizer`....
I recently tried to setup the local inference server for testing different NLG models and found the installation documentation had some missing or incomplete information. We should update this so...
fixes #1549. I've verified that I can setup the inference-server locally using each method with a fresh Python environment. * Updates installation option 2. with necessary requirements. * Pins `fastapi`...
We want to test how many users the inference-server can serve and with what response times on setups with different numbers / types of GPUs & CPUs devices. On the...
# Overview We want to test the dockerised inference-server under different stress conditions such as; 1. Load testing - handling many concurrent of users 2. Latency testing - speed of...
Deploy distributed tests on a standardised compute setup rather than running them locally.
Perform these distributed tests for different amounts of concurrent users & fix any logical problems that arise from the stress tests
Expand the distributed tests to the /work and /generate_stream and other endpoints in the application. We can potentially also look at endpoints outside of the inference-server
i.e. We can look at scenarios where a user starts multiple short conversations vs one long continuous conversation.
We want to test the performance of different models within the inference server to understand how it scales with model size such as; * [distilgpt2](https://huggingface.co/distilgpt2) * [pythia-12B](https://huggingface.co/theblackcat102/pythia-12B-dedup-1000?text=My+name+is+Lewis+and+I+like+to)