Martin Evans
Martin Evans
It sounds like there's two parts to this request: - Ability to configure the backend before it is loaded. - At the moment this isn't possible, because the DLL loading...
Loading all of the backends at once wouldn't work with the current system because the native methods are written like this: ```csharp [DllImport("libllama")] public static extern void demo_method(); ``` That...
> What if we wrote a bunch of delegates as a sort of wrapped API, and use LoadLibrary to swap dll's out during runtime As I understand it that's roughly...
Looks like you've already worked it out, but `NativeLibraryConfig.Default.WithLibrary(` is the way to do this :) Please note though that you **cannot** just download the latest DLL from llama.cpp -...
I do think this is a _viable_ design, the backend could be specified when you load the model and then from then on it can be hanled automatically within LLamaSharp...
> Couldn't we just automatically and only load those that the CPU/GPUs support based on interrogating the OS? That's actually what we already do. CUDA binaries are laoded based on...
I don't think we do anything specific for OpenCL at the moment. but this: > Ideally we just want to have it work as fast as it can Is definitely...
OpenCL support will be merged in with #479, and will probably be included in the next release (still needs some work doing to create the new nuget packages).
> Batched inference is not user-friendly That's mostly because it's not designed to be 😆 The `BatchedExecutor` is the "minimum viable product" to expose low level primitives in a safe...
(Just to note I haven't looked at #683 yet. I wasn't suggesting things that should be added to that specific PR, just the general direction of the project overall for...