FNsi

Results 68 comments of FNsi

> There are some interesting things to see in the KV cache, for example, it seems the K data seems to change little from token to token. Since the rope...

Somehow in my view it's better to keep more releases so we can figure out where the problems coming from.

that master move Q8_0 from 5 to 6 in ggml.h line 208 to 209. While let me confuse to see that 6 is q4_3 or q8_0. After reading more code...

the llama.h LLAMA_FTYPE_MOSTLY Has the numbers not matching to ggml.h GGML_TYPE

Thank you for the great work. ~~Currently Perplexity not working in that PR.~~ ~~Running perplexity and it stuck after show the 655chunks, batch_size=512 GPU is still working. Let me try...

> @slaren can you check in Cuda, currently `--memory_f32` is broken for me. This --memory_f32 is Working with gfx1035 (HSA gfx1030) indeed the vega integrated gpu 680M More detail: I...

> > I suspect it has something to do with the GPU architecture that is being built. My Makefile changes will detect the GPU of your system but that may...

> > Try ( export CXX=hipcc ) before you compiling? > > > > i don't think that's correct. i was using hipcc in my code, but SlyEcho is using...

I think that might like the full output using huge layers be interrupted after the first some layers, Like if output 65b is reasonable, Output65b * layers7b have/ layers s65b...

Can you guys make a cache layer for the most frequently words? I guess that would be fast in many ways.