mlc-llm
mlc-llm copied to clipboard
[SLM] Support BERT architecture. Implement a text embedding module
This PR supports text embedding in MLC-LLM with a BERT encoder-only model.
Example usage: https://github.com/rickzx/mlc-llm/blob/18aa7ee378b826a61ce4baa98e4bab1bf3d64038/python/mlc_llm/embeddings/embeddings.ipynb
This is a good first step towards embedding support through python-level API, would be great to also think about what does it take to bring it as part of the ThreadEngine, in which case we do need to support multiple models, but also have opportunity to support it as a universal embedding endpoint
please fix the jenkins here
please fix the jenkins here
Should be addressed by https://github.com/mlc-ai/mlc-llm/pull/2292. I'm triggering a rebuild now
To fix CUDA error, https://github.com/apache/tvm/pull/16982