LFX Mentorship (Jun-Aug, 2024): Enabling LLM fine tuning in the WASI-NN ggml plugin
Summary
Motivation
WasmEdge is a lightweight and cross-platform runtime for LLM applications. It allows developers to create LLM apps on a Mac or Windows dev machine, compile them to Wasm, and deploy them on Nvidia machines without any changes to the binary app.
It achieves application portability across CPUs and GPUs by supporting a W3C standard API called WASI-NN, which abstracts GPU-related AI functions as high-level APIs. At this stage, however, only inference functions are supported.
In this project, we aim to support fine-tuning features in WasmEdge. It will improve the developer experience for WasmEdge-enabled LLM tools. To achieve this, we plan to extend the current WASI-NN spec by adding a set of extra APIs, and then implement them by delegating to corresponding functions in llama.cpp embedded in the WasmEdge GGML plugin.
Details
Expected outcome:
- Use llama2-7b as the base LLM for fine-tuning; the final implementation should handle it correctly.
- Extend the WASI-NN spec if needed to support the fine-tuning feature.
- Implement the fine-tuning functions inside WASI-NN ggml plugin. They will call the corresponding functions in llama.cpp, as the inference functions do.
- Implement the LoRA-related functions inside the WASI-NN ggml plugin to load the pre-trained LoRA and verify the fine-tuned model.
- Documentation, examples, tutorials, and demonstration are required.
Recommended Skills: C++, WebAssembly, LLM fine-tuning
Since llama.cpp works on CPUs, you do NOT need a GPU device to work on this task.
Application Link
https://mentorship.lfx.linuxfoundation.org/project/41c5a3df-0b84-4b78-b343-bacfc2a3c4ff
Appendix
llama.cpp: https://github.com/ggerganov/llama.cpp WasmEdge GGML examples: https://github.com/second-state/WasmEdge-WASINN-examples/tree/master/wasmedge-ggml LlamaEdge: https://github.com/second-state/LlamaEdge
Hi @hydai,
I'm Ankit, and I'm thrilled about the opportunity to contribute to the WasmEdge organization, especially through this project. I believe it aligns well with my skills and interests.
To better prepare myself to become a potential contributor, could you suggest any prerequisite tasks or specific issues that would be beneficial to work on? I'm enthusiastic about collaborating with the team to make meaningful contributions to WasmEdge.
Looking forward to your guidance.
Best regards, Ankit
Hi @Aankirz
This project is relying on the llama.cpp. We would like to enable the fine-tuning feature in the WASI-NN ggml plugin. So there is something you should pay attention to: WASI-NN spec, fine-tuning in llama.cpp, and how to integrate them into our current plugin.
Hi @hydai sir, I have gone through this project and got to know that need to define new APIs for fine-tuning operations within WASI-NN and Understand llama.cpp's fine-tuning and create bindings in the plugin to call its functions. If applicable then Implement functionalities for LoRA-related tasks like loading pre-trained models. But main focus should on WASI-NN Spec Extensions, understanding llama.cpp Fine-Tuning and creating WASI-NN ggml Plugin Bindings.
Unfortunately, GSoC did not select this, moving it to LFX mentorship.
Hi @hydai , I want to contribute to this project for the LFX mentorship, is there any update to the comment you did on March 4? It would be great to get some help on the initial tasks and potential focus on the issues. Thank You
No updates. Since this topic is not picked by GSoC, we don't have such mentee to implement anything.
Are there any additional comments and issues that I should work on to get started on this project?
@hydai the wasinn examples given have all been tested on high RAM CPUs about 64 GBs . would this project be doable with 16 GB cpus??
@hydai I am willing to contribute in this project. Can this project be run on base 8gb RAM M1 mac?
@hydai the wasinn examples given have all been tested on high RAM CPUs about 64 GBs . would this project be doable with 16 GB cpus??
@hydai I am willing to contribute in this project. Can this project be run on base 8gb RAM M1 mac?
You can choose a small size of the model such as Q2 or Q4 to reduce the size.
Hi @hydai, I am interested in contributed to this project and submitted application on LFX mentorship.
@hydai I have tried fine tuning llama-2-7b-chat.Q2_K on my local machine using llama.cpp.
@hydai can you give some resources to read about how to implement functions in wasi-nn ggml plugin please
Interesting want to work on this task
Hi @hydai, I have applied to the project on LFX and hope to contribute to your mentorship. Thank You