Justine Tunney
Justine Tunney
Well there would be a different prefix for every third party library, because they're all in their own bazel repositories.
> Does the performance matter in practice? I ran strace on GCC invocations for TensorFlow and determined those stat() system calls were >50% of wall time. So yes, but with...
Yes but when are modules actually going to happen? And will they require existing codebases to be rewritten in order to reap the benifits? https://youtu.be/ND-TuW0KIgg
Please run: ``` ./llamafile.exe -m Meta-Llama-3-70B-Instruct.Q4_0.llamafile -ngl 999 --port 7777 --strace ``` And copy / paste me the last 20 or so lines that happen before it crashes. Next, run...
Also, I just have to ask, do you really have a graphics card on Windows with 50 GB of VRAM?
@JohannesGaessler Only 13% of bf16 numbers can be represented accurately by a bf16 -> fp16 conversion. https://justine.lol/tmp/bf16-to-fp16.txt Yes, the vast majority of weights cluster within that 13%. By my calculation,...
You might find the differences negligible, but it's important to me. I want llamafile to be able to deliver, to the best of its ability, whatever number of bits are...
I don't hold any demands on your time. In terms of resources, Mozilla is sponsoring me to help llama.cpp so you've got a lot more resources than before. At the...
Here's the decoding process for bfloat16: ```c typedef struct { uint16_t x; } ggml_bf16_t; /** * Converts brain16 to float32. */ static inline float ggml_bf16_to_fp32(ggml_bf16_t h) { union { float...
@Artefact2 I've updated `gguf-py/gguf/constants.py` so that BF16 is listed. I have no idea how to make the Python script generate BF16 GGML files. What I've been doing is running `convert.py...