Justine Tunney

Results 533 comments of Justine Tunney

That number is intended. When you specify a number too, it automatically adjusts down to the number of layers in the model.

Thank you @laooopooo. Does it work for you if you add `-DGGML_CUDA_FORCE_MMQ`?

I'm glad to hear that. Here's the avx2 and avx512 variations if you want to try them out: ```c inline __m256 llamafile_expf_avx2(__m256 x) { const __m256 r = _mm256_set1_ps(0x1.8p23f); const...

@ggerganov Running your command, I'm noticing the advantage here increases from 1.5x to 1.9x if we include AVX2. On znver4 if we also include avx512 then that goes up to...

@chriselrod Could you help me modify my avx512 intrinsics to use _mm512_scalef_ps (vscalefps) like your code? I'm currently talking to ARM Limited about getting these functions into Glibc, since our...

I just imported stable-diffusion.cpp into the llamafile codebase, which uses these expf() functions, and things work fine. I'm not seeing any black squares. I even enabled trapping math to be...

The `INFINITY` constant alone is used 83 times in the llama.cpp codebase, so compiling with `-ffinite-math-only` might not be a bright idea. If you want us to stop using infinity...

I concur. I tested every single one of the `-ffast-math` flags and I couldn't find any improvements in my accuracy script. Except for `-funsafe-math-optimizations` which caused a 20% reduction in...

Thanks for helping @vlasky! You can also say `-c 0` as an easy way to set the max context size allowed by the model.

OK you have a sandybridge CPU. Five years EOL but still supported by us. Could you run `./llava-v1.5-7b-q4.llamafile --version` and tell me what it says? It'd help to know what...