[OV]: load and convert llms in original precision

Open eaidova opened this issue 1 year ago • 2 comments

What does this PR do?

allow loading bfloat16 and float16 models in original precision for conversion. It significantly reduces memory consumption and loading time during model conversion for large models

Fixes # (issue)

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you make sure to update the documentation with your changes?
[ ] Did you write any new necessary tests?

Jun 24 '24 11:06 eaidova

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Jun 24 '24 11:06 HuggingFaceDocBuilderDev

Looks great, thanks a lot @eaidova

@echarlaix thanks, we still investigating impact on models accuracy and quantization on our side. Could you please do not merge these changes, until we do not have whole picture?

Jul 01 '24 10:07 eaidova

@IlyasMoutawwakil could you please merge?

Aug 19 '24 09:08 eaidova