CTranslate2
CTranslate2 copied to clipboard
Inference failed with "axis 2 has dimension xxxx but expected yyyy" error
I tried to use ctranslate2 as the inference framework to do model inference, but failed with error as below: "axis 2 has dimension 8192 but expected 7680"
What I've done:
-
First I must convert the model to CT2 model, but due to big model size, I used Quantify parameter to reduce model file's size: converter.convert(output_dir, quantization="int8",force=True)
-
Then, Load the quantified model and do inference, unfortunately I hitted below error: "axis 2 has dimension 8192 but expected 7680" error
How to fix it ?
Inference code snippet is as below: ` try: # 加载量化后的模型作为 Generator generator = ctranslate2.Generator("gemma-2-9b-it-ct2", device="cpu")
# 准备输入
input_text = "Translate this to French: Hello, world!"
tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_text))
# 使用 generate_batch 方法进行推理
results = generator.generate_batch([tokens], max_length=50, sampling_topk=1)
# 解码并打印结果
for result in results:
output_tokens = result.sequences[0]
output_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))
print(f"Input: {input_text}")
print(f"Output: {output_text}")
except Exception as e: print(f"Error during model loading or inference: {e}")`