Max Krasnyansky
Max Krasnyansky
@slaren @fmz and I worked on further improvements (removing special cases, reducing branches, etc) and at this point it seems like it should be good to merge. I believe the...
@slaren > The BLAS backend is still important at least in macOS because Accelerate is significantly faster. OpenMP is also not available in macOS. In my opinion this is the...
@slaren Sorry for not catching this earlier. The timing had to be just right to trigger that race. I reproduced it with `while true; do ./llama-bench -m ../gguf/stories260K.gguf -r 10...
@slaren Another quick question. `ggml-blas.cpp` is C++ and is using C++11 stuff like `std::future` when OpenMP is disabled. Would it be OK to do Thread Pool V3 in C++? We...
Ah. I didn't see this issue earlier. I kind of started looking into this already as well. I wanted to publish winget packages for Windows on ARM64 (Snapdragon X-Elite) and...
Sorry, if I wasn't clear. My plan was to publish decent CPU versions to start so that simple `winget install llama.cpp` works. Users get usable version with basically zero effort....
> @max-krasnyansky : Ollama v0.3.12 supports winget install and it now also works great / native on my Snapdragon X Elite Surface Laptop 7 on Windows (for ARM). I did...
Making C++ standard conditional on some backend is not a good idea. We recently decided to go C++17 by default. PopOS is based on Ubuntu. Why don't you just get...
Ah. Yeah, sorry, my bad. You need newer CUDA toolkit. CUDA 12 should work (it does on Ubuntu 24.04). https://toranbillups.com/blog/archive/2023/08/19/install-cuda-12-on-popos/ Technically you can explicitly specify the host compiler to use...
@ggerganov @slaren Any thoughts on bumping C++11 to C++20? I'd love to enable things like `std::span` (C++20), `std::string_view` (C++17) and other C++17 features from `atomic` and `thread` libraries (shared_mutex, hardware_interference_size,...