WasmEdge icon indicating copy to clipboard operation
WasmEdge copied to clipboard

LFX Mentorship (Jun-Aug, 2024): Enabling LLM fine tuning in the WASI-NN ggml plugin

Open hydai opened this issue 2 years ago • 15 comments

Summary

Motivation

WasmEdge is a lightweight and cross-platform runtime for LLM applications. It allows developers to create LLM apps on a Mac or Windows dev machine, compile them to Wasm, and deploy them on Nvidia machines without any changes to the binary app.

It achieves application portability across CPUs and GPUs by supporting a W3C standard API called WASI-NN, which abstracts GPU-related AI functions as high-level APIs. At this stage, however, only inference functions are supported.

In this project, we aim to support fine-tuning features in WasmEdge. It will improve the developer experience for WasmEdge-enabled LLM tools. To achieve this, we plan to extend the current WASI-NN spec by adding a set of extra APIs, and then implement them by delegating to corresponding functions in llama.cpp embedded in the WasmEdge GGML plugin.

Details

Expected outcome:

  • Use llama2-7b as the base LLM for fine-tuning; the final implementation should handle it correctly.
  • Extend the WASI-NN spec if needed to support the fine-tuning feature.
  • Implement the fine-tuning functions inside WASI-NN ggml plugin. They will call the corresponding functions in llama.cpp, as the inference functions do.
  • Implement the LoRA-related functions inside the WASI-NN ggml plugin to load the pre-trained LoRA and verify the fine-tuned model.
  • Documentation, examples, tutorials, and demonstration are required.

Recommended Skills: C++, WebAssembly, LLM fine-tuning

Since llama.cpp works on CPUs, you do NOT need a GPU device to work on this task.

Application Link

https://mentorship.lfx.linuxfoundation.org/project/41c5a3df-0b84-4b78-b343-bacfc2a3c4ff

Appendix

llama.cpp: https://github.com/ggerganov/llama.cpp WasmEdge GGML examples: https://github.com/second-state/WasmEdge-WASINN-examples/tree/master/wasmedge-ggml LlamaEdge: https://github.com/second-state/LlamaEdge

hydai avatar Feb 08 '24 19:02 hydai

Hi @hydai,

I'm Ankit, and I'm thrilled about the opportunity to contribute to the WasmEdge organization, especially through this project. I believe it aligns well with my skills and interests.

To better prepare myself to become a potential contributor, could you suggest any prerequisite tasks or specific issues that would be beneficial to work on? I'm enthusiastic about collaborating with the team to make meaningful contributions to WasmEdge.

Looking forward to your guidance.

Best regards, Ankit

Aankirz avatar Mar 03 '24 06:03 Aankirz

Hi @Aankirz This project is relying on the llama.cpp. We would like to enable the fine-tuning feature in the WASI-NN ggml plugin. So there is something you should pay attention to: WASI-NN spec, fine-tuning in llama.cpp, and how to integrate them into our current plugin.

hydai avatar Mar 04 '24 13:03 hydai

Hi @hydai sir, I have gone through this project and got to know that need to define new APIs for fine-tuning operations within WASI-NN and Understand llama.cpp's fine-tuning and create bindings in the plugin to call its functions. If applicable then Implement functionalities for LoRA-related tasks like loading pre-trained models. But main focus should on WASI-NN Spec Extensions, understanding llama.cpp Fine-Tuning and creating WASI-NN ggml Plugin Bindings.

Sayanjones avatar Mar 10 '24 21:03 Sayanjones

Unfortunately, GSoC did not select this, moving it to LFX mentorship.

hydai avatar May 02 '24 16:05 hydai

Hi @hydai , I want to contribute to this project for the LFX mentorship, is there any update to the comment you did on March 4? It would be great to get some help on the initial tasks and potential focus on the issues. Thank You

abhinavs001 avatar May 14 '24 12:05 abhinavs001

No updates. Since this topic is not picked by GSoC, we don't have such mentee to implement anything.

hydai avatar May 14 '24 13:05 hydai

Are there any additional comments and issues that I should work on to get started on this project?

abhinavs001 avatar May 15 '24 21:05 abhinavs001

@hydai the wasinn examples given have all been tested on high RAM CPUs about 64 GBs . would this project be doable with 16 GB cpus??

aamod-wick avatar May 17 '24 06:05 aamod-wick

@hydai I am willing to contribute in this project. Can this project be run on base 8gb RAM M1 mac?

lazyperson1020 avatar May 17 '24 08:05 lazyperson1020

@hydai the wasinn examples given have all been tested on high RAM CPUs about 64 GBs . would this project be doable with 16 GB cpus??

@hydai I am willing to contribute in this project. Can this project be run on base 8gb RAM M1 mac?

You can choose a small size of the model such as Q2 or Q4 to reduce the size.

hydai avatar May 17 '24 15:05 hydai

Hi @hydai, I am interested in contributed to this project and submitted application on LFX mentorship.

WayfaringKid avatar May 22 '24 06:05 WayfaringKid

@hydai I have tried fine tuning llama-2-7b-chat.Q2_K on my local machine using llama.cpp. Screenshot 2024-05-23 at 9 24 53 AM

lazyperson1020 avatar May 23 '24 03:05 lazyperson1020

@hydai can you give some resources to read about how to implement functions in wasi-nn ggml plugin please

lazyperson1020 avatar May 23 '24 06:05 lazyperson1020

Interesting want to work on this task

crocmons avatar May 23 '24 08:05 crocmons

Hi @hydai, I have applied to the project on LFX and hope to contribute to your mentorship. Thank You

ThienNguyen27 avatar May 27 '24 13:05 ThienNguyen27