CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

Inference failed with "axis 2 has dimension xxxx but expected yyyy" error

Open GangLiCN opened this issue 5 months ago • 2 comments

I tried to use ctranslate2 as the inference framework to do model inference, but failed with error as below: "axis 2 has dimension 8192 but expected 7680"

What I've done:

  1. First I must convert the model to CT2 model, but due to big model size, I used Quantify parameter to reduce model file's size: converter.convert(output_dir, quantization="int8",force=True)

  2. Then, Load the quantified model and do inference, unfortunately I hitted below error: "axis 2 has dimension 8192 but expected 7680" error

How to fix it ?

Inference code snippet is as below: ` try: # 加载量化后的模型作为 Generator generator = ctranslate2.Generator("gemma-2-9b-it-ct2", device="cpu")

# 准备输入
input_text = "Translate this to French: Hello, world!"
tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_text))

# 使用 generate_batch 方法进行推理
results = generator.generate_batch([tokens], max_length=50, sampling_topk=1)

# 解码并打印结果
for result in results:
    output_tokens = result.sequences[0]
    output_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))
    print(f"Input: {input_text}")
    print(f"Output: {output_text}")

except Exception as e: print(f"Error during model loading or inference: {e}")`

GangLiCN avatar Sep 03 '24 06:09 GangLiCN