Justine Tunney comments

Results 533 comments of


                                            Justine Tunney

Update README.md

That number is intended. When you specify a number too, it automatically adjusts down to the number of layers in the model.

GGML_ASSERT: ggml-cuda.cu:9198: !"CUDA error"

Thank you @laooopooo. Does it work for you if you add `-DGGML_CUDA_FORCE_MMQ`?

ggml : rewrite silu and softmax for cpu

I'm glad to hear that. Here's the avx2 and avx512 variations if you want to try them out: ```c inline __m256 llamafile_expf_avx2(__m256 x) { const __m256 r = _mm256_set1_ps(0x1.8p23f); const...

ggml : rewrite silu and softmax for cpu

@ggerganov Running your command, I'm noticing the advantage here increases from 1.5x to 1.9x if we include AVX2. On znver4 if we also include avx512 then that goes up to...

ggml : rewrite silu and softmax for cpu

@chriselrod Could you help me modify my avx512 intrinsics to use _mm512_scalef_ps (vscalefps) like your code? I'm currently talking to ARM Limited about getting these functions into Glibc, since our...

ggml : rewrite silu and softmax for cpu

I just imported stable-diffusion.cpp into the llamafile codebase, which uses these expf() functions, and things work fine. I'm not seeing any black squares. I even enabled trapping math to be...

ggml : rewrite silu and softmax for cpu

The `INFINITY` constant alone is used 83 times in the llama.cpp codebase, so compiling with `-ffinite-math-only` might not be a bright idea. If you want us to stop using infinity...

ggml : rewrite silu and softmax for cpu

I concur. I tested every single one of the `-ffast-math` flags and I couldn't find any improvements in my accuracy script. Except for `-funsafe-math-optimizations` which caused a 20% reduction in...

How to set context size? Running dolphin mixtral q4km, using too much of my 64gb of ram. want to lower it.

Thanks for helping @vlasky! You can also say `-c 0` as an easy way to set the max context size allowed by the model.

Illegal Instruction when running a llamafile

OK you have a sandybridge CPU. Five years EOL but still supported by us. Could you run `./llava-v1.5-7b-q4.llamafile --version` and tell me what it says? It'd help to know what...