llamafile
llamafile copied to clipboard
Support BERT architecture in llamafile
A lot of popular & high-quality sentence embedding models are based on the BERT architecture e.g. sentence-transformers. Sentence embedding models are very important for many uses cases e.g. RAG, lightweight classifiers.
Llama.cpp already seems to support BERT, at least as of the most recent release tag b2589
though I only tested with this model: https://huggingface.co/leliuga/all-MiniLM-L6-v2-GGUF/blob/main/all-MiniLM-L6-v2.F32.gguf
The above model has also worked in several previous llamafile commits, most recently ab7b75d
. Out of curiosity I tested all commits from cb92b3
back to 663382
, results below:
1 means 'broken'
0 means 'not broken'
---
1 cb92b32a6dfae4dff06d7333afe51f2b7224f709 # 2024-04-03, latest
1 56a9d7646c426c3ef4febb87375d994043f3c487
...
1 c0208c155ceacd49fca66614e58edc81768fc1da # 2024-03-22, last llama.cpp sync?
0 ab7b75dd8dd0e8521773184a8a4f8be9c94a02a0
0 725aa3a486c46953280d4d55394b88e57bfd0f86
0 4ab9d7285c08c0ab7996e7a4f0b16d4864560996
0 9c53c272650bb84f3d6fb66aa0dd0fb20f92b18c
0 3913d8858f9d95468ebbb5bdcc1b8a8de053f73c
0 cdc09c35f9dd0b1e2e1d94a0a2b90439f76801c8
0 f33895292ad3a00779b0c796458f1ca3dc18c54d
1 1aee87f716db5fcd4a2192699426799cefb451f2
0 663382a5012d66c0a55199420065adca89a6087c # 2024-03-06
The above model worked between March 7 (f33895
) up until March 21 (ab7b75d
). Then on March 22 (c0208c
), there was a llama.cpp sync and the model doesn't work in any subsequent commit as of cb92b3
. I'm not familiar enough with the llamafile or llama.cpp codebases to debug this issue very efficiently, hoping someone else can help.
For reference:
- tested using this script: https://gist.github.com/k8si/d6d2b4aed668d21aaf30e8246d1a4e21
- only tested on Mac M2 Pro (macOS 14.2.1)