Akarshan Biswas
Akarshan Biswas
@ngxson ~~Can you please refresh this branch with master?~~ Nvm. Ended up using your fork .. ~~working great!!!~~ 👍 On further testing, it seems that llama_batch_size exceeds sometimes in successive...
Just to point out here that during nnx models training, the overall GPU usage do not cross above 20%. Will splitting into graphdef and state improve the performance?
Fixed in 0.6.6 with dbdc03158300dea06c1f3ff025fe4c7ceff66969 and subsequent commits. Now the backend manages its own llama-server state.
This is an appimage bundling problem. The only way to fix it is to either build from source it on RHEL 9.6 or wait for flatpak package. Unfortunately flatpak package...
What happens when you append `--log-verbose` to llama-server? Unfortunately, I don't have a windows machine to test.
If you are able to share a stack trace, it would be very helpful. The stack trace would allow us to pinpoint the issue.
Also, adding to this, a proper function calling support in the server since llama 3.1 now supports tooling/function calling.
> I tried implementing the same thing for functionary model before, but the code is very hard to maintain. ~~Can you point me to that commit?~~ Edit: @ngxson Got the...
Moving to Tauri also has an opportunity to directly integrate llama.cpp Rust bindings into Jan for the llama.cpp provider extension. Doing so would: * **Provide enhanced hardware insights within Jan**,...
For now I think we should support backends supported by ggml for local inference. MLX supporting openai servers can already be supported as an external provider.