SONG Ge
SONG Ge
You may clear the model with `del llm_model`.
You may use `st.cache_resource.clear()` to rerun to create a new model as below: ```bash model = create_model(name1) del model st.cache_resource.clear() model = create_model(name2) ```
Here is my implementation, error happens when executed `ret = (*resp->oh.zesInit)(0);` and `ze_intel_gpu64.dll` could be loaded. https://github.com/felipeagc/ollama/blob/main/gpu/gpu_info_oneapi.c https://github.com/felipeagc/ollama/blob/main/gpu/gpu_info_oneapi.h
I also tried `"C:\Windows\System32\ze_loader.dll"`, but still got error, this also related to `ret = (*resp->oh.zesInit)(0);` : 
Are you running ollama in a Dockerfile? And could you show the `sycl-ls` after activating oneapi?
May I ask if mllama could be compiled in this PR? I didn't see the relevant CMakeLists.
Hi @pauleseifert. I think this should be an OOM issue, you may try to set `OLLAMA_PARALLEL=1` before you start `ollama serve` to reduce memory usage.
1. Sorry for the typo error, it should be `OLLAMA_PARALLEL=1` instead of `OLLAMA_NUM_PARALLEL`. 2. Could you please check and provide your GPU memory usage when running Ollama?
Can you provide the memory usage before and after running `ollama run `? This can help us resolve the issue.
Hi @jianjungu, you may also see [ipex-llm ollama quickstart](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md) for the current ollama version. 