gkiri comments

Results 6 comments of


                                            gkiri

text2vec-ollama is not working

version: '3.4' services: weaviate_anon: command: - --host - 0.0.0.0 - --port - '8080' - --scheme - http image: cr.weaviate.io/semitechnologies/weaviate:1.24.10 ports: - 8080:8080 - 50051:50051 restart: on-failure:0 environment: OPENAI_APIKEY: $OPENAI_APIKEY QUERY_DEFAULTS_LIMIT:...

text2vec-ollama is not working

works with latest .Thanks

TTFT latency for long context (16K) is very high around 15 seconds for llama3.1 70b model. (same or worse than vLLM)

python3 -m sglang.launch_server --model-path hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 --dtype half --trust-remote-code --quantization marlin --enable-p2p-check --efficient-weight-load --host 0.0.0.0 --mem-fraction-static 0.875 --disable-cuda-graph --max-running-requests 5 --port 30000 --context-length 16000

TTFT latency for long context (16K) is very high around 15 seconds for llama3.1 70b model. (same or worse than vLLM)

Please provide Input query length >10K tokens to observe high TTFT latency

TTFT latency for long context (16K) is very high around 15 seconds for llama3.1 70b model. (same or worse than vLLM)

python -m vllm.entrypoints.openai.api_server --model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 --max_model_len 16000 --gpu-memory-utilization 0.98 --dtype=half --enforce-eager --quantization awq --swap-space 4 --disable-log-requests --trust-remote-code --enable-prefix-caching --use-v2-block-manager

Great project and thank you for open sourcing .For the folks who would like to order ready made kit , Can you please provide info on where to order full Omni next kit with all 4fisheye cameras integrated.

any update on this ?