Eric Curtin
Eric Curtin
This creates an model swapping layer between the client and inferencing server. ## Summary by Sourcery New Features: - Create a model swapping intermediary layer in the serving core
### Issue Description ``` $ bin/ramalama ls Starting importing AI models to new store... Imported /Users/ecurtin/.local/share/ramalama/models/ollama/smollm:135m -> /Users/ecurtin/.local/share/ramalama/store/ollama/smollm/smollm/snapshots/sha256-e0a34469bc46d5348026979b54d997a0be594c1edfd36928bb472d34f5a1f117/smollm:135m Imported /Users/ecurtin/.local/share/ramalama/models/ollama/smollm:1.7b -> /Users/ecurtin/.local/share/ramalama/store/ollama/smollm/smollm/snapshots/sha256-fa9c77c540510d4ac24593497c6a56df798617c9eb94bd0e9434aea9d51c9b76/smollm:1.7b NAME MODIFIED SIZE ollama://smollm/smollm:135m Less than a second...
Would behave like: ``` ramalama rag --watchdir /somedir ``` on changes to files in the directory would automatically add the new data to the vector db.
We wrote an ollama puller from scratch in python3, it's like 2 http requests. But the final request is like a "podman artifact pull", explore if we can replace this...
A decent reference implementation: https://github.com/exo-explore/exo
Often on a system without a GPU enabled if you attempt to run a vulkan workload, you get something like this: `ggml_vulkan: Warning: Device type is CPU. This is probably...
There's two main model as OCI artefact types outside of RamaLama, Ollama and Docker Model Runner, For Ollama we did the OCI pulling in python3 with one of the container...
**Describe the enhancement** I would like to build two different types of initramfs's on a Linux distribution. One "fat" version built simply via "dracut -f" and another slimmed down version,...