Martin Evans comments

Results 252 comments of


                                            Martin Evans

Feature Request: Switch backends dynamically at runtime?

It sounds like there's two parts to this request: - Ability to configure the backend before it is loaded. - At the moment this isn't possible, because the DLL loading...

Feature Request: Switch backends dynamically at runtime?

Loading all of the backends at once wouldn't work with the current system because the native methods are written like this: ```csharp [DllImport("libllama")] public static extern void demo_method(); ``` That...

Feature Request: Switch backends dynamically at runtime?

> What if we wrote a bunch of delegates as a sort of wrapped API, and use LoadLibrary to swap dll's out during runtime As I understand it that's roughly...

Feature Request: Switch backends dynamically at runtime?

Looks like you've already worked it out, but `NativeLibraryConfig.Default.WithLibrary(` is the way to do this :) Please note though that you **cannot** just download the latest DLL from llama.cpp -...

Feature Request: Switch backends dynamically at runtime?

I do think this is a _viable_ design, the backend could be specified when you load the model and then from then on it can be hanled automatically within LLamaSharp...

Feature Request: Switch backends dynamically at runtime?

> Couldn't we just automatically and only load those that the CPU/GPUs support based on interrogating the OS? That's actually what we already do. CUDA binaries are laoded based on...

Feature Request: Switch backends dynamically at runtime?

I don't think we do anything specific for OpenCL at the moment. but this: > Ideally we just want to have it work as fast as it can Is definitely...

Feature Request: Switch backends dynamically at runtime?

OpenCL support will be merged in with #479, and will probably be included in the next release (still needs some work doing to create the new nuget packages).

[Proposal] Refactor the mid-level and high-level implementations of LLamaSharp

> Batched inference is not user-friendly That's mostly because it's not designed to be 😆 The `BatchedExecutor` is the "minimum viable product" to expose low level primitives in a safe...

[Proposal] Refactor the mid-level and high-level implementations of LLamaSharp

(Just to note I haven't looked at #683 yet. I wasn't suggesting things that should be added to that specific PR, just the general direction of the project overall for...