WasmEdge icon indicating copy to clipboard operation
WasmEdge copied to clipboard

question: Possibility of simplifying running AI model-specific wasm Workloads with Embedded Configuration

Open sohankunkerkar opened this issue 1 year ago • 7 comments

Summary

I have been using WasmEdge with the llama2 model, and it's working great with the following command:

$ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm

In a containerized environment where WasmEdge is integrated into a lower-level runtime called crun-wasm, I would like to simplify running the llama2 model. The goal is to embed configuration settings directly into the wasm workload, allowing for a more straightforward execution in the containerized environment using crun-wasm.

Is it possible to configure the llama-chat.wasm workload in a way that essential settings (such as --dir and --nn-preload) can be embedded directly into the wasm file. This would enable us to run the model like:

$ ./llama-chat.wasm

Please provide guidance or suggestions on achieving a more streamlined execution of AI-integrated wasm workloads in a containerized environment with wasmedge integrated into a lower-level runtime.

Appendix

No response

sohankunkerkar avatar Feb 14 '24 02:02 sohankunkerkar

Hi @sohankunkerkar

We have been trying very hard to get WasmEdge GGML plugin working with crun-wasm. It is high on our list.

A sticking point is that crun does not detect the correct CUDA version on the host machine. We are not sure why. @hydai perhaps can shed more light on this?

juntao avatar Feb 14 '24 03:02 juntao

@juntao Thanks for the reply. Let me know if you need some traction on the crun-specific issue.

sohankunkerkar avatar Feb 14 '24 04:02 sohankunkerkar

Hi @CaptainVincent We would like to have an issue tracking the crun+WasmEdge+ggml plugin. Could you please raise a new issue and talk about the current status of the run-related integration?

hydai avatar Feb 14 '24 05:02 hydai

https://github.com/WasmEdge/WasmEdge/issues/3217 has been added.

CaptainVincent avatar Feb 15 '24 07:02 CaptainVincent

@sohankunkerkar You mention "simplify" running llama in containerized environment, could you please guide me to some resource with steps to run llama2 with crun wasm environment which works currently .

shiveshcodes avatar May 27 '24 09:05 shiveshcodes

https://wasmedge.org/docs/start/build-and-run/docker_wasm_gpu https://wasmedge.org/docs/start/build-and-run/podman_wasm_gpu I have two documents that you might find useful.

CaptainVincent avatar Jun 04 '24 05:06 CaptainVincent

@shiveshcodes, I believe the links provided by @CaptainVincent would be helpful for your needs. If you're interested in the cri-o + crun workflow, you can also check out this link: https://github.com/sohankunkerkar/wasm-kubecon-demos/tree/main.

sohankunkerkar avatar Jun 04 '24 17:06 sohankunkerkar