llama-stack
llama-stack copied to clipboard
Composable building blocks to build Llama Apps
# Context - Dell TGI's endpoint ([registry.dell.huggingface.co/enterprise-dell-inference-meta-llama-meta-llama-3.1-8b-instruct](http://registry.dell.huggingface.co/enterprise-dell-inference-meta-llama-meta-llama-3.1-8b-instruct)) ``` {'model_id': '/model', 'model_sha': None, 'model_dtype': 'torch.float16', 'model_device_type': 'cuda', 'model_pipeline_tag': None, ...} ``` - Official TGI ([ghcr.io/huggingface/text-generation-inference:latest](http://ghcr.io/huggingface/text-generation-inference:latest)) ``` {'model_id': 'meta-llama/Llama-3.1-8B-Instruct', 'model_sha': '0e9e39f249a16976918f6564b8830bc894c89659', ...}...
Now you can specify ```platform``` param in the cmd args `llama stack build --template local --image-type docker --platform "linux/amd64" --name llama-stack` resolves #253
Adding Cerebras Inference as an API provider. It looks like the other providers use the [legacy OpenAI API ](https://platform.openai.com/docs/guides/completions) but we prefer the [new chat completion API](https://platform.openai.com/docs/guides/text-generation). As a result...
Added implementations for not implemented methods for agents
Follow the implementation of weaviate: https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/adapters/memory/weaviate
# Test #### --list-templates  #### ollama docker ``` llama-stack/llama_stack/distribution/docker/ollama$ ls compose.yaml ollama-run.yaml llama-stack/llama_stack/distribution/docker/ollama$ docker compose up ```
- Changed return for resolve_impls_for_test in resolver.py to get the persistence store object and had to make changes in other test files to avoid key_error. delete_agent_and _session and get_agent_turn_and_steps tests...
``` $ llama download --source meta --model-id Llama3.2-3B-Instruct --meta-url "https://llama3-2-lightweight.llamameta.net/*?some_stuff_I_do_not_know_if_it_is_safe_to_make_public_so_replacing_with_this_phrase&Download-Request-ID=1259948918685929" Downloading `checklist.chk`... Already downloaded `C:\Users\whiteSkar\.llama\checkpoints\Llama3.2-3B-Instruct\checklist.chk`, skipping... Downloading `tokenizer.model`... Already downloaded `C:\Users\whiteSkar\.llama\checkpoints\Llama3.2-3B-Instruct\tokenizer.model`, skipping... Downloading `params.json`... Already downloaded `C:\Users\whiteSkar\.llama\checkpoints\Llama3.2-3B-Instruct\params.json`, skipping... Downloading `consolidated.00.pth`......
Getting built here for now: https://llama-stack.readthedocs.org/