Rick Zhou
Rick Zhou
In GPT-2, attention calculation requires an additional feature `scale_attn_by_inverse_layer_idx`. It provides a scaling factor per attention layer when calculating the attention score, before applying the softmax function. This PR supports...
Subset: {0=1.0, 1=1.0, 2=1.0, 3=1.0, 4=1.0, 5=1.0, 6=1.0, 7=1.0, 8=1.0, 9=1.0, 10=1.0, 11=1.0, 12=1.0, 13=1.0, 14=1.0, 15=1.0, 16=1.0, 17=1.0, 18=1.0, 19=1.0, 20=1.0, 21=1.0, 22=1.0, 23=1.0} [100, 100, 100, 100, 100,...
This PR supports text embedding in MLC-LLM with a BERT encoder-only model. Example usage: https://github.com/rickzx/mlc-llm/blob/18aa7ee378b826a61ce4baa98e4bab1bf3d64038/python/mlc_llm/embeddings/embeddings.ipynb
This PR adds a function calling example with JSON schema, using Hermes-2-Pro-Mistral-7B model. The system prompt is adapted from: https://github.com/NousResearch/Hermes-Function-Calling/blob/main/prompt_assets/sys_prompt.yml#L29 Query: ``` What is the current weather in celsius in...