Lukas Kreussel
Lukas Kreussel
I could give it a try but im still kinda bussy with the CUDA/OpenCL stuff and i have no idea how i would implement performance metrics and loggin correctly in...
LoRA files should always start with the `ggla` magic. (See [here](https://github.com/ggerganov/llama.cpp/blob/master/convert-lora-to-ggml.py#L51)) Could you link to the LoRA file you were using? And could it be that you did use a...
Shouldn't switching the backend be possible after https://github.com/ggerganov/llama.cpp/pull/2239 is implemented?
Well you can calculate it via: 13b times 16 Bit (f16) = 26 GB. Accelerate will probably try to page some of the layers, if you exceed your 16 GB...
What format has your model? `rustformers` currently only supports `GGJT` models. `GGUF` support still needs some more time. The error your getting means your model has an unknown file type.
This is the related PR in the rustformers repo https://github.com/rustformers/llm/pull/412
`llm-rs-python` is a wrapper around [rustformers/llm](https://github.com/rustformers/llm). If the falcon support lands in [rustformers/llm](https://github.com/rustformers/llm) via https://github.com/rustformers/llm/issues/293 i'll include it in the wrapper.
I don't think the current `langchain` wrapper support async calls, but it shouldn't be to hard to add, as the `model.stream()` call already unlocks the GIL internally while generating tokens....
https://github.com/louisgv/local.ai/assets/65088241/6173b693-d020-4098-9f35-517a720f1044 @louisgv The callback to the UI of the fed tokens seams to slowdown the feeding process significantly. (Should be instant)
Currently this copies the cuda dlls next to the local.ai executable if the `cargo tauri dev` or `cargo tauri build` command is executed with the `--features cublas` flag. @louisgv Is...