Martin Evans comments

Results 252 comments of


                                            Martin Evans

How to accelerate running speed in CPU environment?

What type of memory does your server have? Language models are usually limited mostly by memory bandwidth.

How to accelerate running speed in CPU environment?

That's actually intentional, an approximation copied from llama cpp. CPU utilisation isn't the right thing to measure, you need to look at tokens per second, if you're memory bound adding...

How to accelerate running speed in CPU environment?

Sorry by "Memory bound" I didn't mean quantity, it would have been more correct to say memory **bandwidth** bound. That's usually the limiting factor for LLMs

How to accelerate running speed in CPU environment?

This problem is that this is extremely hardware dependent. For example on my own PC (16 physical cores with hyperthreading so 32 cores): | threads | time | |---------|------| |...

How to accelerate running speed in CPU environment?

For reference (if anyone wants to modify it) the default is implemented here: https://github.com/SciSharp/LLamaSharp/blob/master/LLama/Extensions/IContextParamsExtensions.cs#L53

How to accelerate running speed in CPU environment?

To be clear I have 32 **logical** cores (i.e. `Environment.ProcessorCount == 32`), so that's why I tested all the way to 32 (I'm using a [Ryzen 7950X](https://en.wikipedia.org/wiki/List_of_AMD_Ryzen_processors#Ryzen_7000_series)). > For optimal...

How to accelerate running speed in CPU environment?

That's interesting! Definitely looks like it could be close to what we want. Do you know how this behaves on Linux/MacOS (i.e. does it run but return no results, or...

Consider adding Windows on ARM build of llama.dll to LLamaSharp.Backend.Cpu

Unfortunately it looks like Github Actions doesn't have Windows+ARM available ([docs](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories)) :( Edit: Note that's not a total blocker for this. I think we could cross compile the DLLs from...

Consider adding Windows on ARM build of llama.dll to LLamaSharp.Backend.Cpu

The current LLamaSharp version (0.15.0) is compatible with llama.cpp [b3479](https://github.com/ggerganov/llama.cpp/releases/tag/b3479). You need to make sure you're using that version if you're loading custom binaries.

Unable to load all-MiniLM-L6-v2-f16.gguf OR all-MiniLM-L6-v2-f16.bin

I believe #565 should support this model, but I haven't tested it. @SidAtBluB0X if you could pull that branch and test that model out it'd be very helpful :)