SeanHH86
SeanHH86
``` deployment_config: autoscaling_config: min_replicas: 1 initial_replicas: 1 max_replicas: 8 target_num_ongoing_requests_per_replica: 1.0 metrics_interval_s: 10.0 look_back_period_s: 30.0 smoothing_factor: 1.0 downscale_delay_s: 300.0 upscale_delay_s: 90.0 ray_actor_options: num_cpus: 4 # for a model deployment, we...
Inference's speed is slow
OpenAI API : https://platform.openai.com/docs/api-reference/introduction - Organizations and projects (optional) ``` curl https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" -H "OpenAI-Organization: YOUR_ORG_ID" -H "OpenAI-Project: $PROJECT_ID" ``` - List models: GET https://api.openai.com/v1/models ``` curl...
Sould be work when set OpenSDK endpoint.
Failed to install llama_cpp_python==0.2.57 on MAC, but llama_cpp_python==0.2.56 success.
$ python -V Python 3.10.14 $ gcc -v Apple clang version 15.0.0 (clang-1500.3.9.4) Target: arm64-apple-darwin23.3.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin
Fail to install on Ubuntu20.04 + Python 3.10.14 with same issue.