mlc-llm [SLM] Support BERT architecture. Implement a text embedding module

[SLM] Support BERT architecture. Implement a text embedding module

Open rickzx opened this issue 9 months ago • 1 comments

This PR supports text embedding in MLC-LLM with a BERT encoder-only model.

Example usage: https://github.com/rickzx/mlc-llm/blob/18aa7ee378b826a61ce4baa98e4bab1bf3d64038/python/mlc_llm/embeddings/embeddings.ipynb

Apr 29 '24 20:04 rickzx

This is a good first step towards embedding support through python-level API, would be great to also think about what does it take to bring it as part of the ThreadEngine, in which case we do need to support multiple models, but also have opportunity to support it as a universal embedding endpoint

Apr 30 '24 00:04 tqchen

please fix the jenkins here

May 07 '24 21:05 tqchen

please fix the jenkins here

Should be addressed by https://github.com/mlc-ai/mlc-llm/pull/2292. I'm triggering a rebuild now

May 07 '24 21:05 rickzx

To fix CUDA error, https://github.com/apache/tvm/pull/16982

May 08 '24 21:05 rickzx

mlc-llm mlc-llm copied to clipboard

[SLM] Support BERT architecture. Implement a text embedding module

mlc-llm
mlc-llm copied to clipboard