llmaz Enabling Efficient Model and Container Image Distribution in LLMaz with Dragonfly

~~This is added to OSSP program, so you need to go to https://summer-ospp.ac.cn/org/prodetail/257c80106?list=org&navpage=org to know the details.~~ Remove the OSSP tag as no one applied for this task.

(1) Background: llmaz is a lightweight inference platform based on Kubernetes, focused on efficient deployment and inference of large language models (https://github.com/InftyAI/llmaz). Dragonfly is an open-source P2P file distribution and image acceleration system suitable for cloud-native environments, enhancing model and image distribution efficiency. llmaz has integrated Manta as a lightweight model caching system, but support for image and model distribution needs further optimization. (2) Existing Work: llmaz supports multiple model providers (e.g., HuggingFace) and inference backends (e.g., vLLM), with Manta providing model caching and distribution. Manta leverages P2P technology for model sharding and preheating but focuses solely on models, not container images, and its functionality is still under refactoring.

What would you like to be added:

(4) Desired Improvements: Integrate Dragonfly to optimize llmaz’s image and model distribution efficiency, supporting unified P2P caching and acceleration. Referencing Manta’s lightweight design, ensure Dragonfly integration maintains low resource usage while improving speed and stability. (5) Ultimate Goal: Implement efficient image and model distribution for llmaz using Dragonfly, enhance P2P caching and acceleration, and build a lightweight, versatile solution referencing Manta to improve deployment efficiency and reduce resource costs.

Why is this needed:

llmaz currently lacks efficient container image distribution support. Model distribution relies on Manta, which is incomplete and does not handle images. Dragonfly’s P2P distribution capabilities are not yet integrated, resulting in slow image and model loading, impacting deployment efficiency.

Completion requirements:

This enhancement requires the following artifacts:

[ ] Design doc
[ ] API change
[ ] Docs update

The artifacts should be linked in subsequent comments.

Integrate Dragonfly into llmaz for P2P distribution of images and models.
Develop lightweight Dragonfly configuration for efficient caching and acceleration.
Provide a unified interface for image and model distribution management in llmaz.
Optimize llmaz deployment speed using Dragonfly and generate performance reports.
Write Dragonfly integration documentation and produce deployment test reports.

@carlory will be the mentor of this task.

Apr 22 '25 07:04 pacoxu

Updated.

Remove the OSSP tag as no one applied for this task.

/help

Jun 17 '25 05:06 pacoxu

BTW, https://github.com/CloudNativeAI/model-spec was announced and we may take it into consideration as well. This model spec was mentioned in recent KubeCon China https://www.youtube.com/watch?v=dYVgmr7S-rE&list=PLj6h78yzYM2P1xtALqTcCmRAa6142uERl&index=17.

Jun 17 '25 05:06 pacoxu

Hi, @pacoxu I previously submitted an application for the OSPP, but due to the one-project-per-person rule, I applied to another project, which unfortunately wasn't accepted. I noticed that this project may not yet have an accepted contributor. I actually learned about llmaz early on while preparing my OSPP application, and have been very interested in it ever since. Although I’m not yet very familiar with the codebase, I’m actively studying it and will get up to speed as quickly as possible. I have relevant experience with Kubernetes, Go, and LLMs. I'd love to contribute voluntarily, even outside the official program. I’d truly appreciate any opportunity to contribute and, if possible, to learn under your guidance. May I join and help out with the project? Thank you for your time!

Jun 28 '25 15:06 jiahuipaung

That would be really great!

Jun 28 '25 20:06 kerthcet

May I join and help out with the project? Thank you for your time!

Of course. Great!

Jun 30 '25 01:06 pacoxu

@jiahuipaung You’re very welcome to pick up this task, and we’d be happy to answer questions and mentor you along the way.

Just a heads-up: because our OSPP task has already been closed due to no application, work on this task would be treated as regular, voluntary open-source contribution rather than part of the Open Source Software Supply. There would be no OSPP stipend or formal evaluation attached.

If that still sounds good, feel free to dive in—open a WIP PR or start a discussion thread whenever you’re ready, and we’ll guide you through the next steps.

Looking forward to collaborating!

Jul 01 '25 07:07 pacoxu

Thanks for the clarification! I'm currently drafting a shared document to organize my project plan, learning notes, and progress updates. I'll also open a WIP PR soon to start tracking my work on this task.

Jul 02 '25 14:07 jiahuipaung