Cloud-edge collaborative speculative decoding for LLM based on KubeEdge-Ianvs
- Description:
- The autoregressive decoding mode of LLM determines that LLM can only be decoded serially, which limits its inference speed. Speculative decoding technique can be used to decode LLM in parallel with the help of draft model, so as to improve the inference speed of LLM without loss of accuracy. However, the speculative decoding technology of LLM does not consider the application in the cloud-edge distributed environment. This project aims to implement cloud-edge collaborative speculative decoding based on KubeEdge-Ianvs, an open source cloud-edge collaborative distributed machine learning platform, so as to further improve the LLM inference speed in cloud-edge environment.
- Expected outcome:
- Implement an example of cloud-edge collaborative speculative decoding based on KubeEdge-Ianvs platform.
- (Optional) Propose a more efficient cloud-edge collaborative speculative decoding algorithm.
- Recommended Skills:
- Familiar with LLM related technologies and have experience in deploying open source LLM locally.
- Proficient in Python and Pytorch.
- Have experience in deploying KubeEdge-Ianvs.
Hi @hsj576 I am kairvee and would like to take up this project under LFX mentorship as it aligns with my interests and skills as well , could you assign it and let me know the further details?
Hi @hsj576 I am kairvee and would like to take up this project under LFX mentorship as it aligns with my interests and skills as well , could you assign it and let me know the further details?
The details of the project will be discussed in the weekly SIG ai meeting. Feel free to join us in https://zoom.us/j/4167237304 every Thursday 16:30 UTC+8.
Hi @hsj576 I am kairvee and would like to take up this project under LFX mentorship as it aligns with my interests and skills as well , could you assign it and let me know the further details?
The details of the project will be discussed in the weekly SIG ai meeting. Feel free to join us in https://zoom.us/j/4167237304 every Thursday 16:30 UTC+8.
okk sure , thankyou
Hello @hsj576 my name is Temi I am an open source developer. I have genuine interest to work on this project this fall.
Hi @hsj576 I am kairvee and would like to take up this project under LFX mentorship as it aligns with my interests and skills as well , could you assign it and let me know the further details?
The details of the project will be discussed in the weekly SIG ai meeting. Feel free to join us in https://zoom.us/j/4167237304 every Thursday 16:30 UTC+8.
it seems I missed the meeting. I wasn't familiar with the time zone
Hi @hsj576, Need your reviews and the way forward to contributing
Proposed Approach
-
Speculative Decoding Implementation:
- I plan to set up a basic speculative decoding pipeline using a draft model to parallelize the decoding process, thus improving the inference speed of the LLM.
-
Cloud-Edge Architecture:
- For cloud-edge collaboration, I will deploy the draft model at the edge to handle initial predictions and the full model in the cloud for verification and refinement. This setup aims to optimize resource usage and reduce latency.
-
Testing and Optimization:
- I will benchmark the system to evaluate performance improvements and ensure that the solution does not compromise accuracy. I will also explore possible optimizations to enhance the efficiency of the cloud-edge collaboration.
@hsj576 Are there any pre-tests to submit? I am interested to contribute.
@hsj576 I'm Aryan , I would like to take his project under LFX mentorship as it aligns perfectly with my skills, are there any pretest to submit?
@hsj576, Hello sir, I am Siddhant. I would like to work on this project under LFX mentorship as working on LLMs have been my aim and this opportunity is best to kick-start my journey in open source. I also missed the weekly meeting and would like to know more about the project and how can I help. Also, it would be very kind of you if you could share some resources to make ourselves more prepared for the project
I will release a pretest in next week.
I will release a pretest in next week.
Hello @hsj576 the application for mentees close on 13th August (Tuesday).
I will release a pretest in next week.
Hello @hsj576 the application for mentees close on 13th August (Tuesday).
Ok, I will release the pretest as soon as possible (before 9th August).
Hi, I hope to take on this project. I would like to highlight my advantages and the contributions I can make to the community.
I have a relatively deep understanding of the concept of Edge Computing and LLMs, and I am familiar with the main strategies for LLM cloud-edge collaboration, as well as the principles and implementation methods of Speculative Decoding.
Additionally, I have strong programming skills, particularly in Python and PyTorch. Last year, I interned at a large AI company specializing in LLMs.
What's more, I am trying to conduct a research about LLM cloud-edge collaboration strategy using Ianvs. After two months of learning, I have gained a good understanding of Ianvs's architecture, interfaces, and features. Now I am attempting to introduce a new feature which is relevant to the issue. If I am accepted, the integration of my insights could lead to a more coherent and rational architectural design.
Eagerly await consideration of my taking on this task!
Hi @hsj576
I would like to know more on your weekly meet. Do you have calendar link that I can add myself into? I would like to know more about this project.
And for the Collaborative Speculative Decoding, do you mind to share the paper of this technique? I just find a Collaborative Decoding paper online on Arxiv(https://arxiv.org/html/2406.12295v1) and can't find the Collaborative Speculative Decoding one.
Details of the pretest are released in #130.