Sam Stoelinga
Sam Stoelinga
Seeing this warning: ``` npm install @logto/[email protected] 254 ↵ npm warn EBADENGINE Unsupported engine { npm warn EBADENGINE package: '@silverhand/[email protected]', npm warn EBADENGINE required: { node: '^18.12.0 || ^20.9.0', pnpm:...
Fix for #257
KubeAI logs: ``` 2024/09/29 05:37:45 url: /v1/completions 2024/09/29 05:37:45 sending error response: 400: unable to parse model: unmarshal json: unexpected end of JSON input ``` Request json: ``` { "model":...
autoscaling metrics.. things like number of qps per replica, number of replicas, autoscaling time (e.g. time from pod request to pod ready)
Feel free to edit this description directly. multiple models can share the same alias. Use cases: * Ability to have a single model name e.g. llama-3.1-8b-instruct be backed by A100,...
Add things like periodic benchmarks in the guide.
infinity already exposes metrics, the metrics can be used for autoscaling: https://gist.github.com/samos123/2c28ceb67494e397034e521097bcb4e0
https://github.com/vllm-project/vllm/issues/2130
About 20% of the time some tests fails, and rerunning `make test` without making code changes fixes it. This is one of the failures I observed being flaky: ``` 2024/09/22...