FNsi comments

Results 68 comments of


                                            FNsi

What would it take to 100x the context window?

> There are some interesting things to see in the KV cache, for example, it seems the K data seems to change little from token to token. Since the rope...

Too many releases

Somehow in my view it's better to keep more releases so we can figure out where the problems coming from.

Little confuse in quantize.cpp

that master move Q8_0 from 5 to 6 in ggml.h line 208 to 209. While let me confuse to see that 6 is q4_3 or q8_0. After reading more code...

Little confuse in quantize.cpp

the llama.h LLAMA_FTYPE_MOSTLY Has the numbers not matching to ggml.h GGML_TYPE

ROCm Port

Thank you for the great work. ~~Currently Perplexity not working in that PR.~~ ~~Running perplexity and it stuck after show the 655chunks, batch_size=512 GPU is still working. Let me try...

ROCm Port

> @slaren can you check in Cuda, currently `--memory_f32` is broken for me. This --memory_f32 is Working with gfx1035 (HSA gfx1030) indeed the vega integrated gpu 680M More detail: I...

ROCm Port

> > I suspect it has something to do with the GPU architecture that is being built. My Makefile changes will detect the GPU of your system but that may...

ROCm Port

> > Try ( export CXX=hipcc ) before you compiling? > > > > i don't think that's correct. i was using hipcc in my code, but SlyEcho is using...

Combine large LLM with small LLM for faster inference

I think that might like the full output using huge layers be interrupted after the first some layers, Like if output 65b is reasonable, Output65b * layers7b have/ layers s65b...

Combine large LLM with small LLM for faster inference

Can you guys make a cache layer for the most frequently words? I guess that would be fast in many ways.