feat: add changes to handle jina v2 chinese code
Adds the changes to handle Chinese model from Jina AI.
Changes description:
- Add a preprocessing vocab to handle the Lowercase + Whitespace preprocessing of Jina-v2-ZH.
- Soften some exceptions to validate from the
vocab(This will be the most polemical change in the PR).
Why those changes?
- NULL is in the vocab (Not sure why):
"vocab": {
"<s>": 0,
"<pad>": 1,
"</s>": 2,
"<unk>": 3,
"<mask>": 4,
"\u0000": 5,
"\u0001": 6,
"\u0002": 7,
I think the relevant assertion is GGML_ASSERT(vocab.id_to_token.size() == vocab.token_to_id.size());, not sure it provides more value to have this extra restriction
- Newline not found in vocab (Ċ). This only affects
linefeed_idthat is not affecting inference at all.
📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 541 iterations 🚀
Expand details for performance related PR only
- Concurrent users: 8, duration: 10m
- HTTP request : avg=8663.08ms p(95)=20014.98ms fails=, finish reason: stop=488 truncated=53
- Prompt processing (pp): avg=103.53tk/s p(95)=514.76tk/s
- Token generation (tg): avg=34.38tk/s p(95)=48.3tk/s
- ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=feat-jina-embeddings-v2-zh commit=728e1b4da0cbed99b817016115ec1a30f7281d61
More
---
config:
xyChart:
titleFontSize: 12
width: 900
height: 600
themeVariables:
xyChart:
titleColor: "#000000"
---
xychart-beta
title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
duration=10m 541 iterations"
y-axis "llamacpp:prompt_tokens_seconds"
x-axis "llamacpp:prompt_tokens_seconds" 1717751407 --> 1717752031
line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 749.96, 749.96, 749.96, 749.96, 749.96, 746.67, 746.67, 746.67, 746.67, 746.67, 774.71, 774.71, 774.71, 774.71, 774.71, 825.7, 825.7, 825.7, 825.7, 825.7, 835.61, 835.61, 835.61, 835.61, 835.61, 835.85, 835.85, 835.85, 835.85, 835.85, 850.22, 850.22, 850.22, 850.22, 850.22, 851.43, 851.43, 851.43, 851.43, 851.43, 849.03, 849.03, 849.03, 849.03, 849.03, 860.7, 860.7, 860.7, 860.7, 860.7, 882.21, 882.21, 882.21, 882.21, 882.21, 912.99, 912.99, 912.99, 912.99, 912.99, 928.61, 928.61, 928.61, 928.61, 928.61, 945.78, 945.78, 945.78, 945.78, 945.78, 928.82, 928.82, 928.82, 928.82, 928.82, 931.99, 931.99, 931.99, 931.99, 931.99, 929.48, 929.48, 929.48, 929.48, 929.48, 942.39, 942.39, 942.39, 942.39, 942.39, 936.73, 936.73, 936.73, 936.73, 936.73, 936.66, 936.66, 936.66, 936.66, 936.66, 933.69, 933.69, 933.69, 933.69, 933.69, 917.95, 917.95, 917.95, 917.95, 917.95, 919.14, 919.14, 919.14, 919.14, 919.14, 919.5, 919.5, 919.5, 919.5, 919.5, 904.1, 904.1, 904.1, 904.1, 904.1, 906.1, 906.1, 906.1, 906.1, 906.1, 899.48, 899.48, 899.48, 899.48, 899.48, 905.4, 905.4, 905.4, 905.4, 905.4, 898.9, 898.9, 898.9, 898.9, 898.9, 897.22, 897.22, 897.22, 897.22, 897.22, 896.6, 896.6, 896.6, 896.6, 896.6, 897.09, 897.09, 897.09, 897.09, 897.09, 894.83, 894.83, 894.83, 894.83, 894.83, 897.41, 897.41, 897.41, 897.41, 897.41, 895.98, 895.98, 895.98, 895.98, 895.98, 884.2, 884.2, 884.2, 884.2, 884.2, 884.35, 884.35, 884.35, 884.35, 884.35, 888.37, 888.37, 888.37, 888.37, 888.37, 880.64, 880.64, 880.64, 880.64, 880.64, 880.41, 880.41, 880.41, 880.41, 880.41, 883.15, 883.15, 883.15, 883.15, 883.15, 885.01, 885.01, 885.01, 885.01, 885.01, 890.17, 890.17, 890.17, 890.17, 890.17, 896.3, 896.3, 896.3, 896.3, 896.3, 895.46, 895.46, 895.46, 895.46, 895.46, 893.58, 893.58, 893.58, 893.58, 893.58, 891.99, 891.99, 891.99, 891.99, 891.99, 895.85, 895.85, 895.85, 895.85, 895.85, 897.38, 897.38, 897.38, 897.38, 897.38, 896.4, 896.4, 896.4, 896.4, 896.4, 900.44, 900.44, 900.44, 900.44, 900.44, 902.63, 902.63, 902.63, 902.63, 902.63, 905.85, 905.85, 905.85, 905.85, 905.85, 904.74, 904.74, 904.74, 904.74, 904.74, 895.09, 895.09, 895.09, 895.09, 895.09, 895.97, 895.97, 895.97, 895.97, 895.97, 895.66, 895.66, 895.66, 895.66, 895.66, 894.91, 894.91, 894.91, 894.91, 894.91, 896.09, 896.09, 896.09, 896.09, 896.09, 897.4, 897.4, 897.4, 897.4, 897.4, 897.59, 897.59, 897.59]
More
---
config:
xyChart:
titleFontSize: 12
width: 900
height: 600
themeVariables:
xyChart:
titleColor: "#000000"
---
xychart-beta
title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
duration=10m 541 iterations"
y-axis "llamacpp:predicted_tokens_seconds"
x-axis "llamacpp:predicted_tokens_seconds" 1717751407 --> 1717752031
line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 40.44, 40.44, 40.44, 40.44, 40.44, 28.87, 28.87, 28.87, 28.87, 28.87, 31.51, 31.51, 31.51, 31.51, 31.51, 34.07, 34.07, 34.07, 34.07, 34.07, 34.97, 34.97, 34.97, 34.97, 34.97, 36.29, 36.29, 36.29, 36.29, 36.29, 36.43, 36.43, 36.43, 36.43, 36.43, 36.54, 36.54, 36.54, 36.54, 36.54, 36.33, 36.33, 36.33, 36.33, 36.33, 34.64, 34.64, 34.64, 34.64, 34.64, 34.69, 34.69, 34.69, 34.69, 34.69, 34.21, 34.21, 34.21, 34.21, 34.21, 33.37, 33.37, 33.37, 33.37, 33.37, 32.63, 32.63, 32.63, 32.63, 32.63, 31.82, 31.82, 31.82, 31.82, 31.82, 30.9, 30.9, 30.9, 30.9, 30.9, 30.96, 30.96, 30.96, 30.96, 30.96, 31.17, 31.17, 31.17, 31.17, 31.17, 30.56, 30.56, 30.56, 30.56, 30.56, 30.48, 30.48, 30.48, 30.48, 30.48, 30.28, 30.28, 30.28, 30.28, 30.28, 30.22, 30.22, 30.22, 30.22, 30.22, 30.2, 30.2, 30.2, 30.2, 30.2, 30.3, 30.3, 30.3, 30.3, 30.3, 30.31, 30.31, 30.31, 30.31, 30.31, 30.5, 30.5, 30.5, 30.5, 30.5, 30.55, 30.55, 30.55, 30.55, 30.55, 30.14, 30.14, 30.14, 30.14, 30.14, 30.04, 30.04, 30.04, 30.04, 30.04, 30.21, 30.21, 30.21, 30.21, 30.21, 30.44, 30.44, 30.44, 30.44, 30.44, 30.52, 30.52, 30.52, 30.52, 30.52, 30.6, 30.6, 30.6, 30.6, 30.6, 30.77, 30.77, 30.77, 30.77, 30.77, 30.83, 30.83, 30.83, 30.83, 30.83, 30.71, 30.71, 30.71, 30.71, 30.71, 30.65, 30.65, 30.65, 30.65, 30.65, 30.22, 30.22, 30.22, 30.22, 30.22, 30.03, 30.03, 30.03, 30.03, 30.03, 30.11, 30.11, 30.11, 30.11, 30.11, 30.34, 30.34, 30.34, 30.34, 30.34, 30.42, 30.42, 30.42, 30.42, 30.42, 30.46, 30.46, 30.46, 30.46, 30.46, 30.34, 30.34, 30.34, 30.34, 30.34, 30.09, 30.09, 30.09, 30.09, 30.09, 29.61, 29.61, 29.61, 29.61, 29.61, 28.93, 28.93, 28.93, 28.93, 28.93, 28.82, 28.82, 28.82, 28.82, 28.82, 28.85, 28.85, 28.85, 28.85, 28.85, 28.98, 28.98, 28.98, 28.98, 28.98, 29.03, 29.03, 29.03, 29.03, 29.03, 29.19, 29.19, 29.19, 29.19, 29.19, 29.21, 29.21, 29.21, 29.21, 29.21, 29.11, 29.11, 29.11, 29.11, 29.11, 29.02, 29.02, 29.02, 29.02, 29.02, 29.14, 29.14, 29.14, 29.14, 29.14, 29.18, 29.18, 29.18, 29.18, 29.18, 29.26, 29.26, 29.26, 29.26, 29.26, 29.36, 29.36, 29.36, 29.36, 29.36, 29.49, 29.49, 29.49, 29.49, 29.49, 29.5, 29.5, 29.5]
Details
More
---
config:
xyChart:
titleFontSize: 12
width: 900
height: 600
themeVariables:
xyChart:
titleColor: "#000000"
---
xychart-beta
title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
duration=10m 541 iterations"
y-axis "llamacpp:kv_cache_usage_ratio"
x-axis "llamacpp:kv_cache_usage_ratio" 1717751407 --> 1717752031
line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.17, 0.17, 0.17, 0.17, 0.17, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.21, 0.21, 0.21, 0.21, 0.21, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.37, 0.37, 0.37, 0.37, 0.37, 0.31, 0.31, 0.31, 0.31, 0.31, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.31, 0.31, 0.31, 0.31, 0.31, 0.23, 0.23, 0.23, 0.23, 0.23, 0.24, 0.24, 0.24, 0.24, 0.24, 0.22, 0.22, 0.22, 0.22, 0.22, 0.15, 0.15, 0.15, 0.15, 0.15, 0.21, 0.21, 0.21, 0.21, 0.21, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.26, 0.26, 0.26, 0.26, 0.26, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.28, 0.28, 0.28, 0.28, 0.28, 0.43, 0.43, 0.43, 0.43, 0.43, 0.2, 0.2, 0.2, 0.2, 0.2, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.25, 0.25, 0.25, 0.25, 0.25, 0.47, 0.47, 0.47, 0.47, 0.47, 0.53, 0.53, 0.53, 0.53, 0.53, 0.49, 0.49, 0.49, 0.49, 0.49, 0.33, 0.33, 0.33, 0.33, 0.33, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.09, 0.09, 0.09, 0.09, 0.09, 0.2, 0.2, 0.2, 0.2, 0.2, 0.32, 0.32, 0.32, 0.32, 0.32, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21]
More
---
config:
xyChart:
titleFontSize: 12
width: 900
height: 600
themeVariables:
xyChart:
titleColor: "#000000"
---
xychart-beta
title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
duration=10m 541 iterations"
y-axis "llamacpp:requests_processing"
x-axis "llamacpp:requests_processing" 1717751407 --> 1717752031
line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0]
Is this still working on? Jina models are great at rag scenario. And right now the models like cwchang/jina-embeddings-v2-base-zh in ollama will show errors without llama.cpp's support?