llamafile Support BERT architecture in llamafile

Support BERT architecture in llamafile

Open k8si opened this issue 10 months ago • 0 comments

A lot of popular & high-quality sentence embedding models are based on the BERT architecture e.g. sentence-transformers. Sentence embedding models are very important for many uses cases e.g. RAG, lightweight classifiers.

Llama.cpp already seems to support BERT, at least as of the most recent release tag b2589though I only tested with this model: https://huggingface.co/leliuga/all-MiniLM-L6-v2-GGUF/blob/main/all-MiniLM-L6-v2.F32.gguf

The above model has also worked in several previous llamafile commits, most recently ab7b75d. Out of curiosity I tested all commits from cb92b3 back to 663382, results below:

1 means 'broken'
0 means 'not broken'
---

1 cb92b32a6dfae4dff06d7333afe51f2b7224f709 # 2024-04-03, latest
1 56a9d7646c426c3ef4febb87375d994043f3c487
...
1 c0208c155ceacd49fca66614e58edc81768fc1da # 2024-03-22, last llama.cpp sync?
0 ab7b75dd8dd0e8521773184a8a4f8be9c94a02a0
0 725aa3a486c46953280d4d55394b88e57bfd0f86
0 4ab9d7285c08c0ab7996e7a4f0b16d4864560996
0 9c53c272650bb84f3d6fb66aa0dd0fb20f92b18c
0 3913d8858f9d95468ebbb5bdcc1b8a8de053f73c
0 cdc09c35f9dd0b1e2e1d94a0a2b90439f76801c8
0 f33895292ad3a00779b0c796458f1ca3dc18c54d 
1 1aee87f716db5fcd4a2192699426799cefb451f2
0 663382a5012d66c0a55199420065adca89a6087c # 2024-03-06

The above model worked between March 7 (f33895) up until March 21 (ab7b75d). Then on March 22 (c0208c), there was a llama.cpp sync and the model doesn't work in any subsequent commit as of cb92b3. I'm not familiar enough with the llamafile or llama.cpp codebases to debug this issue very efficiently, hoping someone else can help.

For reference:

tested using this script: https://gist.github.com/k8si/d6d2b4aed668d21aaf30e8246d1a4e21
only tested on Mac M2 Pro (macOS 14.2.1)

Apr 04 '24 17:04 k8si

llamafile llamafile copied to clipboard

Support BERT architecture in llamafile

llamafile
llamafile copied to clipboard