intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard
talking bot backend for windows-pc is not working, notebook need to be updated
followed the guidelines mentioned here:
https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/examples/deployment/talkingbot/server/backend/README.md
first error: positional argument 'model_type' is missing, which is not given in example
TypeError Traceback (most recent call last)
Cell In[17], line 7
5 model = Model()
6 model.tokenizer = tokenizer
----> 7 model.init_from_bin(model_name="llama", model_path="ne_llama_q.bin", max_new_tokens=43, do_sample=False)
10 streamer = TextStreamer(tokenizer)
TypeError: Model.init_from_bin() missing 1 required positional argument: 'model_type'
so, I have added the argument:
model.init_from_bin(model_name="llama", model_path="ne_llama_q.bin", max_new_tokens=43, do_sample=False, model_type="llama")
according to the file:
https://github.com/intel/neural-speed/blob/main/neural_speed/init.py
after adding the position_argument furthermore errors occurs:
TypeError Traceback (most recent call last)
Cell In[19], line 8
6 model.tokenizer = tokenizer
7 #model.init_from_bin(model_name="llama", model_path="ne_llama_q.bin", max_new_tokens=43, do_sample=False)
----> 8 model.init_from_bin(model_name="llama", model_path="ne_llama_q.bin", max_new_tokens=43, do_sample=False, model_type="llama")
10 streamer = TextStreamer(tokenizer)
11 outputs = model.generate(inputs, streamer=streamer)
File ~\Anaconda3\envs\talkingBot\lib\site-packages\neural_speed\__init__.py:274, in Model.init_from_bin(self, model_type, model_path, **generate_kwargs)
271 else:
272 generate_kwargs["scratch_size_ratio"] = 35
--> 274 self.model.init_model(model_path, **generate_kwargs)
TypeError: init_model(): incompatible function arguments. The following argument types are supported:
1. (self: neural_speed.llama_cpp.Model, model_path: str, max_new_tokens: int = -1, n_batch: int = 512, ctx_size: int = 1024, seed: int = -1, threads: int = 8, repetition_penalty: float = 1.100000023841858, num_beams: int = 1, do_sample: bool = False, top_k: int = 40, top_p: float = 0.95, temperature: float = 0.8, min_new_tokens: int = 0, length_penalty: float = 1.0, early_stopping: bool = False, n_keep: int = 0, n_discard: int = -1, shift_roped_k: bool = False, batch_size: int = 1, pad_token: int = -1, memory_dtype: str = 'auto', continuous_batching: bool = True, max_request_num: int = 1, scratch_size_ratio: float = 1.0) -> None
Invoked with: <neural_speed.llama_cpp.Model object at 0x00000211D028C770>, 'ne_llama_q.bin'; kwargs: model_name='llama', max_new_tokens=43, do_sample=False, threads=8
Please, can you update the notebook example
@raj-ritu17 Hi, please share me the model link and full script you used.
first error: positional argument 'model_type' is missing, which is not given in example
for this error, actually we can get the model type from the model config directly. Please share me more details that I can reproduce your error.
@raj-ritu17 you only need to pass model_type='llama' and you do not need to pass the model_name here based on the API. Please try this
model.init_from_bin(model_type="llama", model_path="runtime_outs/ne_llama_q_int4_bestla_cfp32_g32.bin")
Still get some issues and will update the notebook and get back to you later.
@raj-ritu17 @Zhenzhong1 , with few updates this should work
https://github.com/intel/intel-extension-for-transformers/blob/update_talkingbot_pc/intel_extension_for_transformers/neural_chat/examples/deployment/talkingbot/pc/build_talkingbot_on_pc.ipynb