WasmEdge LFX Mentorship (Jun-Aug, 2024): Enabling LLM fine tuning in the WASI-NN ggml plugin

Summary

Motivation

WasmEdge is a lightweight and cross-platform runtime for LLM applications. It allows developers to create LLM apps on a Mac or Windows dev machine, compile them to Wasm, and deploy them on Nvidia machines without any changes to the binary app.

It achieves application portability across CPUs and GPUs by supporting a W3C standard API called WASI-NN, which abstracts GPU-related AI functions as high-level APIs. At this stage, however, only inference functions are supported.

In this project, we aim to support fine-tuning features in WasmEdge. It will improve the developer experience for WasmEdge-enabled LLM tools. To achieve this, we plan to extend the current WASI-NN spec by adding a set of extra APIs, and then implement them by delegating to corresponding functions in llama.cpp embedded in the WasmEdge GGML plugin.

Details

Expected outcome:

Use llama2-7b as the base LLM for fine-tuning; the final implementation should handle it correctly.
Extend the WASI-NN spec if needed to support the fine-tuning feature.
Implement the fine-tuning functions inside WASI-NN ggml plugin. They will call the corresponding functions in llama.cpp, as the inference functions do.
Implement the LoRA-related functions inside the WASI-NN ggml plugin to load the pre-trained LoRA and verify the fine-tuned model.
Documentation, examples, tutorials, and demonstration are required.

Recommended Skills: C++, WebAssembly, LLM fine-tuning

Since llama.cpp works on CPUs, you do NOT need a GPU device to work on this task.

Application Link

https://mentorship.lfx.linuxfoundation.org/project/41c5a3df-0b84-4b78-b343-bacfc2a3c4ff

Appendix

llama.cpp: https://github.com/ggerganov/llama.cpp WasmEdge GGML examples: https://github.com/second-state/WasmEdge-WASINN-examples/tree/master/wasmedge-ggml LlamaEdge: https://github.com/second-state/LlamaEdge

Feb 08 '24 19:02 hydai

Hi @hydai,

I'm Ankit, and I'm thrilled about the opportunity to contribute to the WasmEdge organization, especially through this project. I believe it aligns well with my skills and interests.

To better prepare myself to become a potential contributor, could you suggest any prerequisite tasks or specific issues that would be beneficial to work on? I'm enthusiastic about collaborating with the team to make meaningful contributions to WasmEdge.

Looking forward to your guidance.

Best regards, Ankit

Mar 03 '24 06:03 Aankirz

Hi @Aankirz This project is relying on the llama.cpp. We would like to enable the fine-tuning feature in the WASI-NN ggml plugin. So there is something you should pay attention to: WASI-NN spec, fine-tuning in llama.cpp, and how to integrate them into our current plugin.

Mar 04 '24 13:03 hydai

Hi @hydai sir, I have gone through this project and got to know that need to define new APIs for fine-tuning operations within WASI-NN and Understand llama.cpp's fine-tuning and create bindings in the plugin to call its functions. If applicable then Implement functionalities for LoRA-related tasks like loading pre-trained models. But main focus should on WASI-NN Spec Extensions, understanding llama.cpp Fine-Tuning and creating WASI-NN ggml Plugin Bindings.

Mar 10 '24 21:03 Sayanjones

Unfortunately, GSoC did not select this, moving it to LFX mentorship.

May 02 '24 16:05 hydai

Hi @hydai , I want to contribute to this project for the LFX mentorship, is there any update to the comment you did on March 4? It would be great to get some help on the initial tasks and potential focus on the issues. Thank You

May 14 '24 12:05 abhinavs001

No updates. Since this topic is not picked by GSoC, we don't have such mentee to implement anything.

May 14 '24 13:05 hydai

Are there any additional comments and issues that I should work on to get started on this project?

May 15 '24 21:05 abhinavs001

@hydai the wasinn examples given have all been tested on high RAM CPUs about 64 GBs . would this project be doable with 16 GB cpus??

May 17 '24 06:05 aamod-wick

@hydai I am willing to contribute in this project. Can this project be run on base 8gb RAM M1 mac?

May 17 '24 08:05 lazyperson1020

@hydai the wasinn examples given have all been tested on high RAM CPUs about 64 GBs . would this project be doable with 16 GB cpus??

@hydai I am willing to contribute in this project. Can this project be run on base 8gb RAM M1 mac?

You can choose a small size of the model such as Q2 or Q4 to reduce the size.

May 17 '24 15:05 hydai

Hi @hydai, I am interested in contributed to this project and submitted application on LFX mentorship.

May 22 '24 06:05 WayfaringKid

@hydai I have tried fine tuning llama-2-7b-chat.Q2_K on my local machine using llama.cpp. Screenshot 2024-05-23 at 9 24 53 AM

May 23 '24 03:05 lazyperson1020

@hydai can you give some resources to read about how to implement functions in wasi-nn ggml plugin please

May 23 '24 06:05 lazyperson1020

Interesting want to work on this task

May 23 '24 08:05 crocmons

Hi @hydai, I have applied to the project on LFX and hope to contribute to your mentorship. Thank You

May 27 '24 13:05 ThienNguyen27