Jesse
Jesse
> It will probably work somewhat (at least at low context lengths) as the paper said it was trained from `3.1-terminus` using around 1T tokens to learn the `top-k` stuff,...
It looks like the chat template has changed since V3.1. Here's the new chat template in human readable format for reference: https://gist.github.com/createthis/1467af177656a034f14ae01053a714f9
I had to switch back to 3.1-Terminus for agentic work because 3.2-Exp is falling into degenerate generation around 45k context with this code.
I haven't been able to look at this all week due to paid work obligations. However, my weekend has recently cleared, so I'm going to dedicate that time to this...
Just in case anyone else is super dense like me and needs to ask questions, here's that Architecture page from the PDF as markdown. Click details, below, to expand and...
Some cheat sheet explanations for those of us who didn't get a Ph.d in machine learning. I have a background in 3d programming and my math education included basic calculus,...
tilelang has a deepseek32 example folder: https://github.com/tile-ai/tilelang/tree/main/examples/deepseek_v32 It includes: - FP8 lightning indexer: https://github.com/tile-ai/tilelang/blob/main/examples/deepseek_v32/fp8_lighting_indexer.py - top-k selector: https://github.com/tile-ai/tilelang/blob/main/examples/deepseek_v32/topk_selector.py - MQA examples (see README)
Here is an analysis by DeepSeek V3.1-Terminus of the fp8 lightning indexer example, when given the DSA PDF context: https://gist.github.com/createthis/0cce8a250daa3a117cb2986c743c02f2 And topk_selector.py: https://gist.github.com/createthis/69417474e24ca7a8096ce5a08227ab0c This is helpful (to me) because it...
Quick update: Over the weekend I transitioned from focusing on cloning the VLLM functionality to cloning the tilelang functionality. I feel like tilelang's examples are just better organized for educational...
New File Structure: 1. llama-sparse-indexer.cpp/h - Lightning Indexer (FP8 computation) - Corresponds to fp8_lighting_indexer.py - Implements: $`I_{t,s}=\sum_{j=1}^{H^{I}} w^{I}_{t,j}\cdot\mathrm{ReLU}\left(\mathbf{q}^{I}_{t,j}\cdot \mathbf{k}^{I}_{s}\right)`$ 2. llama-sparse-topk.cpp/h - Top-k Selector - Corresponds to topk_selector.py - Selects...