Pascal
Pascal
I built this as an exercise, now I’ll be able to add it to my llama.cpp dev server page, so when I click on a .gguf, it’ll launch the backend...
Next, I’ll try to visualize the quantization blocks, those little per-group slices (like 32×N tiles), and maybe add some filters to highlight scale or residual patterns. Later, I’ll make it...
Built this in one day, as KISS as possible: pure C++/GGML + vanilla JS. I still need to transfer FP32 weights in binary instead of JSON to squeeze out more...
> [@ServeurpersoCom](https://github.com/ServeurpersoCom) Very cool! Showing the weight values when hovering on the pixels would be useful. Sure, on it :) OBS doesn’t capture Firefox tooltips, so when I tried to...
I'm noticing some strange artifacts on certain slices of specific models, looks like repeated patterns along one axis, which could either be mathematically expected or a quantization glitch. When hovering...
For sure we can run the tiny backend on a HF Space. I just need to optimize the communication between the frontend and/or the streaming layer, avoiding resending data that’s...
If we rely only on the GGML public API, the tooltip can safely decode any block using ggml_get_type_traits(type)->to_float, which gives us the FP32 values directly. That works fine and remains...
You're absolutely right — this is the core issue I ran into as well. The current behavior of always sending the full WebUI config overrides any server-side defaults, even when...
I’ve reverted my previous PR (reasoning-format-minimax-m2) and merged PR #16932 into my testing-branch16 for isolated testing. I’m running llama-swap with the new XML tool-call parser to check MiniMax-M2 compatibility without...
> Oh! It seems you’re using non-streaming mode. I can now reproduce your issue with `stream: false`. > > Let me dig into what’s happening… Yes, exactly: it works correctly...