intel-extension-for-transformers talking bot backend for windows-pc is not working, notebook need to be updated

followed the guidelines mentioned here:

https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/examples/deployment/talkingbot/server/backend/README.md

first error: positional argument 'model_type' is missing, which is not given in example

TypeError                                 Traceback (most recent call last)
Cell In[17], line 7
      5 model = Model()
      6 model.tokenizer = tokenizer
----> 7 model.init_from_bin(model_name="llama", model_path="ne_llama_q.bin", max_new_tokens=43, do_sample=False)
     10 streamer = TextStreamer(tokenizer)

TypeError: Model.init_from_bin() missing 1 required positional argument: 'model_type'

so, I have added the argument: model.init_from_bin(model_name="llama", model_path="ne_llama_q.bin", max_new_tokens=43, do_sample=False, model_type="llama") according to the file:

https://github.com/intel/neural-speed/blob/main/neural_speed/init.py

after adding the position_argument furthermore errors occurs:

TypeError                                 Traceback (most recent call last)
Cell In[19], line 8
      6 model.tokenizer = tokenizer
      7 #model.init_from_bin(model_name="llama", model_path="ne_llama_q.bin", max_new_tokens=43, do_sample=False)
----> 8 model.init_from_bin(model_name="llama", model_path="ne_llama_q.bin", max_new_tokens=43, do_sample=False, model_type="llama")
     10 streamer = TextStreamer(tokenizer)
     11 outputs = model.generate(inputs, streamer=streamer)

File ~\Anaconda3\envs\talkingBot\lib\site-packages\neural_speed\__init__.py:274, in Model.init_from_bin(self, model_type, model_path, **generate_kwargs)
    271                 else:
    272                     generate_kwargs["scratch_size_ratio"] = 35
--> 274 self.model.init_model(model_path, **generate_kwargs)

TypeError: init_model(): incompatible function arguments. The following argument types are supported:
    1. (self: neural_speed.llama_cpp.Model, model_path: str, max_new_tokens: int = -1, n_batch: int = 512, ctx_size: int = 1024, seed: int = -1, threads: int = 8, repetition_penalty: float = 1.100000023841858, num_beams: int = 1, do_sample: bool = False, top_k: int = 40, top_p: float = 0.95, temperature: float = 0.8, min_new_tokens: int = 0, length_penalty: float = 1.0, early_stopping: bool = False, n_keep: int = 0, n_discard: int = -1, shift_roped_k: bool = False, batch_size: int = 1, pad_token: int = -1, memory_dtype: str = 'auto', continuous_batching: bool = True, max_request_num: int = 1, scratch_size_ratio: float = 1.0) -> None

Invoked with: <neural_speed.llama_cpp.Model object at 0x00000211D028C770>, 'ne_llama_q.bin'; kwargs: model_name='llama', max_new_tokens=43, do_sample=False, threads=8

Please, can you update the notebook example

Apr 26 '24 07:04 raj-ritu17

@raj-ritu17 Hi, please share me the model link and full script you used.

first error: positional argument 'model_type' is missing, which is not given in example

for this error, actually we can get the model type from the model config directly. Please share me more details that I can reproduce your error.

Apr 26 '24 10:04 Zhenzhong1

@raj-ritu17 you only need to pass model_type='llama' and you do not need to pass the model_name here based on the API. Please try this

model.init_from_bin(model_type="llama", model_path="runtime_outs/ne_llama_q_int4_bestla_cfp32_g32.bin")

Still get some issues and will update the notebook and get back to you later.

Apr 29 '24 03:04 Spycsh

@raj-ritu17 @Zhenzhong1 , with few updates this should work

https://github.com/intel/intel-extension-for-transformers/blob/update_talkingbot_pc/intel_extension_for_transformers/neural_chat/examples/deployment/talkingbot/pc/build_talkingbot_on_pc.ipynb

Apr 29 '24 04:04 Spycsh

intel-extension-for-transformers intel-extension-for-transformers copied to clipboard

talking bot backend for windows-pc is not working, notebook need to be updated

after adding the position_argument furthermore errors occurs:

intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard