Ella Charlaix
Ella Charlaix
Perfect thanks @Narsil, I need to wait for updates from the Intel collaboration before merging, will change the PR status to draft temporarily
Hi @jiqing-feng, would it be a similar integration to what was integrated in [ipex-llm](https://github.com/intel-analytics/ipex-llm/blob/c41730e024965b18c437461e6c11b38848223682/python/llm/src/ipex_llm/transformers/models/llama.py) ?
For me it would make sense to keep this integration to [ipex-llm](https://github.com/intel-analytics/ipex-llm/blob/c41730e024965b18c437461e6c11b38848223682/python/llm/src/ipex_llm/transformers/models/llama.py) and to only enable loading of exported model in optimum-intel (through `IPEXModel`), what do you think ?
Hi @jiqing-feng, I see that different llama modeling (and other additional architectures) were introduced in both [ipex](https://github.com/intel/intel-extension-for-pytorch/blob/2ec5bc44be875c4a86f4248c42bcdbccd4b8510a/examples/cpu/inference/python/llm-modeling/modeling_llama.py#L411) and [ipex-llm](https://github.com/intel-analytics/ipex-llm/blob/0b7e78b59235295e0cee37cadd9fc0adc04997ec/python/llm/src/ipex_llm/transformers/models/llama.py#L1904) to introduce ipex optimization. I think redefining the modeling of transformers...
Hi @Zjq9409, Currently the `torch_dtype` parameter is ignored but enabling the loading of the model in bf16 before exporting it to the OpenVINO format is something that we plan to...
cc @IlyasMoutawwakil
To fix the code style test you can do the following : ``` pip install .[quality] make style ```
> but for some reason output ids are matching locally and not on the runner (two tests with old onnx model) Do you know where this could come from ?
cc @mfuntowicz
Thanks a lot @ashim-mahara ! If you don't have time to update it I can open a PR tomorrow or next week (all the onnx / onnxruntime optimum integration will...