llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

Composable building blocks to build Llama Apps

Results 360 llama-stack issues
Sort by recently updated
recently updated
newest added

### Why this PR We want to setup Weaviate as a remote vector db provider for llama-stack. ### What is in the PR - Add in Weaviate memory adapter to...

CLA Signed

For testing, brought up the local instance of the llama stack and ran few safety queries with input prompt to check. Verified that the output looks as expected.

CLA Signed

We want to setup Databricks as a remote inference provider for llama-stack. [Databricks Foundation Model APIs](https://docs.databricks.com/en/machine-learning/foundation-models/index.html) are OpenAI compatible and we suggest using the [OpenAI client](https://docs.databricks.com/en/machine-learning/model-serving/score-foundation-models.html) to query Databricks model...

CLA Signed

After launching the distribution server by `"llama distribution start --name local-llama-8b --port 5000 --disable-ipv6 "`, running any inference example, for example `"python examples/scripts/vacation.py localhost 5000 --disable-safety"` will give the following...

Trying to run inference with FP8 quantization, and got the following error: ``` Configuring API surface: inference Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required): Meta-Llama3.1-8B-Instruct Enter value for quantization (optional):...

Trying to run inference with FP8 version of Llama 3.1 405B model (Meta-Llama3.1-405B-Instruct). The model was downloaded with `llama download --source huggingface --model-id Meta-Llama3.1-405B-Instruct --hf-token TOKEN`. However, the command `llama...

Describe the bug The model ID for several of the 405B models include colons making them incompatible with windows systems. EX: Meta-Llama3.1-405B-Instruct:bf16-mp8 OSError: [WinError 123] The filename, directory name, or...

Hi dear team! I love your work. I wanted to ask, how should one report about security bugs / vulnerabilities? I would like to report a security vulnerability that I...

``` import torch from transformers import AutoModelForCausalLM, AutoTokenizer, AwqConfig model_id = "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4" llm = HuggingFaceLLM( context_window=8192, #4096 max_new_tokens=512, generate_kwargs={"temperature": 0, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name=model_id, model_name=model_id, device_map="auto", tokenizer_kwargs={"max_length": 8192} #...