Ella Charlaix

Results 53 comments of Ella Charlaix

Perfect thanks @Narsil, I need to wait for updates from the Intel collaboration before merging, will change the PR status to draft temporarily

Hi @jiqing-feng, would it be a similar integration to what was integrated in [ipex-llm](https://github.com/intel-analytics/ipex-llm/blob/c41730e024965b18c437461e6c11b38848223682/python/llm/src/ipex_llm/transformers/models/llama.py) ?

For me it would make sense to keep this integration to [ipex-llm](https://github.com/intel-analytics/ipex-llm/blob/c41730e024965b18c437461e6c11b38848223682/python/llm/src/ipex_llm/transformers/models/llama.py) and to only enable loading of exported model in optimum-intel (through `IPEXModel`), what do you think ?

Hi @jiqing-feng, I see that different llama modeling (and other additional architectures) were introduced in both [ipex](https://github.com/intel/intel-extension-for-pytorch/blob/2ec5bc44be875c4a86f4248c42bcdbccd4b8510a/examples/cpu/inference/python/llm-modeling/modeling_llama.py#L411) and [ipex-llm](https://github.com/intel-analytics/ipex-llm/blob/0b7e78b59235295e0cee37cadd9fc0adc04997ec/python/llm/src/ipex_llm/transformers/models/llama.py#L1904) to introduce ipex optimization. I think redefining the modeling of transformers...

Hi @Zjq9409, Currently the `torch_dtype` parameter is ignored but enabling the loading of the model in bf16 before exporting it to the OpenVINO format is something that we plan to...

To fix the code style test you can do the following : ``` pip install .[quality] make style ```

> but for some reason output ids are matching locally and not on the runner (two tests with old onnx model) Do you know where this could come from ?

cc @mfuntowicz

Thanks a lot @ashim-mahara ! If you don't have time to update it I can open a PR tomorrow or next week (all the onnx / onnxruntime optimum integration will...