llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

feat: add changes to handle jina v2 chinese code

Open JoanFM opened this issue 1 year ago • 2 comments

Adds the changes to handle Chinese model from Jina AI.

Changes description:

  • Add a preprocessing vocab to handle the Lowercase + Whitespace preprocessing of Jina-v2-ZH.
  • Soften some exceptions to validate from the vocab (This will be the most polemical change in the PR).

Why those changes?

  • NULL is in the vocab (Not sure why):
    "vocab": {
      "<s>": 0,
      "<pad>": 1,
      "</s>": 2,
      "<unk>": 3,
      "<mask>": 4,
      "\u0000": 5,
      "\u0001": 6,
      "\u0002": 7,

I think the relevant assertion is GGML_ASSERT(vocab.id_to_token.size() == vocab.token_to_id.size());, not sure it provides more value to have this extra restriction

  • Newline not found in vocab (Ċ). This only affects linefeed_id that is not affecting inference at all.

JoanFM avatar Jun 06 '24 08:06 JoanFM

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 541 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8663.08ms p(95)=20014.98ms fails=, finish reason: stop=488 truncated=53
  • Prompt processing (pp): avg=103.53tk/s p(95)=514.76tk/s
  • Token generation (tg): avg=34.38tk/s p(95)=48.3tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=feat-jina-embeddings-v2-zh commit=728e1b4da0cbed99b817016115ec1a30f7281d61

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717751407 --> 1717752031
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 749.96, 749.96, 749.96, 749.96, 749.96, 746.67, 746.67, 746.67, 746.67, 746.67, 774.71, 774.71, 774.71, 774.71, 774.71, 825.7, 825.7, 825.7, 825.7, 825.7, 835.61, 835.61, 835.61, 835.61, 835.61, 835.85, 835.85, 835.85, 835.85, 835.85, 850.22, 850.22, 850.22, 850.22, 850.22, 851.43, 851.43, 851.43, 851.43, 851.43, 849.03, 849.03, 849.03, 849.03, 849.03, 860.7, 860.7, 860.7, 860.7, 860.7, 882.21, 882.21, 882.21, 882.21, 882.21, 912.99, 912.99, 912.99, 912.99, 912.99, 928.61, 928.61, 928.61, 928.61, 928.61, 945.78, 945.78, 945.78, 945.78, 945.78, 928.82, 928.82, 928.82, 928.82, 928.82, 931.99, 931.99, 931.99, 931.99, 931.99, 929.48, 929.48, 929.48, 929.48, 929.48, 942.39, 942.39, 942.39, 942.39, 942.39, 936.73, 936.73, 936.73, 936.73, 936.73, 936.66, 936.66, 936.66, 936.66, 936.66, 933.69, 933.69, 933.69, 933.69, 933.69, 917.95, 917.95, 917.95, 917.95, 917.95, 919.14, 919.14, 919.14, 919.14, 919.14, 919.5, 919.5, 919.5, 919.5, 919.5, 904.1, 904.1, 904.1, 904.1, 904.1, 906.1, 906.1, 906.1, 906.1, 906.1, 899.48, 899.48, 899.48, 899.48, 899.48, 905.4, 905.4, 905.4, 905.4, 905.4, 898.9, 898.9, 898.9, 898.9, 898.9, 897.22, 897.22, 897.22, 897.22, 897.22, 896.6, 896.6, 896.6, 896.6, 896.6, 897.09, 897.09, 897.09, 897.09, 897.09, 894.83, 894.83, 894.83, 894.83, 894.83, 897.41, 897.41, 897.41, 897.41, 897.41, 895.98, 895.98, 895.98, 895.98, 895.98, 884.2, 884.2, 884.2, 884.2, 884.2, 884.35, 884.35, 884.35, 884.35, 884.35, 888.37, 888.37, 888.37, 888.37, 888.37, 880.64, 880.64, 880.64, 880.64, 880.64, 880.41, 880.41, 880.41, 880.41, 880.41, 883.15, 883.15, 883.15, 883.15, 883.15, 885.01, 885.01, 885.01, 885.01, 885.01, 890.17, 890.17, 890.17, 890.17, 890.17, 896.3, 896.3, 896.3, 896.3, 896.3, 895.46, 895.46, 895.46, 895.46, 895.46, 893.58, 893.58, 893.58, 893.58, 893.58, 891.99, 891.99, 891.99, 891.99, 891.99, 895.85, 895.85, 895.85, 895.85, 895.85, 897.38, 897.38, 897.38, 897.38, 897.38, 896.4, 896.4, 896.4, 896.4, 896.4, 900.44, 900.44, 900.44, 900.44, 900.44, 902.63, 902.63, 902.63, 902.63, 902.63, 905.85, 905.85, 905.85, 905.85, 905.85, 904.74, 904.74, 904.74, 904.74, 904.74, 895.09, 895.09, 895.09, 895.09, 895.09, 895.97, 895.97, 895.97, 895.97, 895.97, 895.66, 895.66, 895.66, 895.66, 895.66, 894.91, 894.91, 894.91, 894.91, 894.91, 896.09, 896.09, 896.09, 896.09, 896.09, 897.4, 897.4, 897.4, 897.4, 897.4, 897.59, 897.59, 897.59]
                    
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717751407 --> 1717752031
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 40.44, 40.44, 40.44, 40.44, 40.44, 28.87, 28.87, 28.87, 28.87, 28.87, 31.51, 31.51, 31.51, 31.51, 31.51, 34.07, 34.07, 34.07, 34.07, 34.07, 34.97, 34.97, 34.97, 34.97, 34.97, 36.29, 36.29, 36.29, 36.29, 36.29, 36.43, 36.43, 36.43, 36.43, 36.43, 36.54, 36.54, 36.54, 36.54, 36.54, 36.33, 36.33, 36.33, 36.33, 36.33, 34.64, 34.64, 34.64, 34.64, 34.64, 34.69, 34.69, 34.69, 34.69, 34.69, 34.21, 34.21, 34.21, 34.21, 34.21, 33.37, 33.37, 33.37, 33.37, 33.37, 32.63, 32.63, 32.63, 32.63, 32.63, 31.82, 31.82, 31.82, 31.82, 31.82, 30.9, 30.9, 30.9, 30.9, 30.9, 30.96, 30.96, 30.96, 30.96, 30.96, 31.17, 31.17, 31.17, 31.17, 31.17, 30.56, 30.56, 30.56, 30.56, 30.56, 30.48, 30.48, 30.48, 30.48, 30.48, 30.28, 30.28, 30.28, 30.28, 30.28, 30.22, 30.22, 30.22, 30.22, 30.22, 30.2, 30.2, 30.2, 30.2, 30.2, 30.3, 30.3, 30.3, 30.3, 30.3, 30.31, 30.31, 30.31, 30.31, 30.31, 30.5, 30.5, 30.5, 30.5, 30.5, 30.55, 30.55, 30.55, 30.55, 30.55, 30.14, 30.14, 30.14, 30.14, 30.14, 30.04, 30.04, 30.04, 30.04, 30.04, 30.21, 30.21, 30.21, 30.21, 30.21, 30.44, 30.44, 30.44, 30.44, 30.44, 30.52, 30.52, 30.52, 30.52, 30.52, 30.6, 30.6, 30.6, 30.6, 30.6, 30.77, 30.77, 30.77, 30.77, 30.77, 30.83, 30.83, 30.83, 30.83, 30.83, 30.71, 30.71, 30.71, 30.71, 30.71, 30.65, 30.65, 30.65, 30.65, 30.65, 30.22, 30.22, 30.22, 30.22, 30.22, 30.03, 30.03, 30.03, 30.03, 30.03, 30.11, 30.11, 30.11, 30.11, 30.11, 30.34, 30.34, 30.34, 30.34, 30.34, 30.42, 30.42, 30.42, 30.42, 30.42, 30.46, 30.46, 30.46, 30.46, 30.46, 30.34, 30.34, 30.34, 30.34, 30.34, 30.09, 30.09, 30.09, 30.09, 30.09, 29.61, 29.61, 29.61, 29.61, 29.61, 28.93, 28.93, 28.93, 28.93, 28.93, 28.82, 28.82, 28.82, 28.82, 28.82, 28.85, 28.85, 28.85, 28.85, 28.85, 28.98, 28.98, 28.98, 28.98, 28.98, 29.03, 29.03, 29.03, 29.03, 29.03, 29.19, 29.19, 29.19, 29.19, 29.19, 29.21, 29.21, 29.21, 29.21, 29.21, 29.11, 29.11, 29.11, 29.11, 29.11, 29.02, 29.02, 29.02, 29.02, 29.02, 29.14, 29.14, 29.14, 29.14, 29.14, 29.18, 29.18, 29.18, 29.18, 29.18, 29.26, 29.26, 29.26, 29.26, 29.26, 29.36, 29.36, 29.36, 29.36, 29.36, 29.49, 29.49, 29.49, 29.49, 29.49, 29.5, 29.5, 29.5]
                    

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717751407 --> 1717752031
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.17, 0.17, 0.17, 0.17, 0.17, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.21, 0.21, 0.21, 0.21, 0.21, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.37, 0.37, 0.37, 0.37, 0.37, 0.31, 0.31, 0.31, 0.31, 0.31, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.31, 0.31, 0.31, 0.31, 0.31, 0.23, 0.23, 0.23, 0.23, 0.23, 0.24, 0.24, 0.24, 0.24, 0.24, 0.22, 0.22, 0.22, 0.22, 0.22, 0.15, 0.15, 0.15, 0.15, 0.15, 0.21, 0.21, 0.21, 0.21, 0.21, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.26, 0.26, 0.26, 0.26, 0.26, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.28, 0.28, 0.28, 0.28, 0.28, 0.43, 0.43, 0.43, 0.43, 0.43, 0.2, 0.2, 0.2, 0.2, 0.2, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.25, 0.25, 0.25, 0.25, 0.25, 0.47, 0.47, 0.47, 0.47, 0.47, 0.53, 0.53, 0.53, 0.53, 0.53, 0.49, 0.49, 0.49, 0.49, 0.49, 0.33, 0.33, 0.33, 0.33, 0.33, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.09, 0.09, 0.09, 0.09, 0.09, 0.2, 0.2, 0.2, 0.2, 0.2, 0.32, 0.32, 0.32, 0.32, 0.32, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21]
                    
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717751407 --> 1717752031
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0]
                    

github-actions[bot] avatar Jun 06 '24 10:06 github-actions[bot]

Is this still working on? Jina models are great at rag scenario. And right now the models like cwchang/jina-embeddings-v2-base-zh in ollama will show errors without llama.cpp's support?

arkohut avatar Sep 30 '24 02:09 arkohut