gkiri

Results 6 comments of gkiri

version: '3.4' services: weaviate_anon: command: - --host - 0.0.0.0 - --port - '8080' - --scheme - http image: cr.weaviate.io/semitechnologies/weaviate:1.24.10 ports: - 8080:8080 - 50051:50051 restart: on-failure:0 environment: OPENAI_APIKEY: $OPENAI_APIKEY QUERY_DEFAULTS_LIMIT:...

works with latest .Thanks

python3 -m sglang.launch_server --model-path hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 --dtype half --trust-remote-code --quantization marlin --enable-p2p-check --efficient-weight-load --host 0.0.0.0 --mem-fraction-static 0.875 --disable-cuda-graph --max-running-requests 5 --port 30000 --context-length 16000

python -m vllm.entrypoints.openai.api_server --model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 --max_model_len 16000 --gpu-memory-utilization 0.98 --dtype=half --enforce-eager --quantization awq --swap-space 4 --disable-log-requests --trust-remote-code --enable-prefix-caching --use-v2-block-manager