neural-compressor Any example to quantise a text embedding model on Intel Gaudi2?

Any example to quantise a text embedding model on Intel Gaudi2?

Open sleepingcat4 opened this issue 1 year ago • 2 comments

I was looking for example or documentation how I can load or quantise both a HF embedding model on Intel Gaudi2. is there any examples available? I don't want to use docker btw

Jul 14 '24 04:07 sleepingcat4

@sleepingcat4 Please refer to: https://github.com/intel/neural-compressor/tree/bfa27e422dc4760f6a9b1783eee7dae10fe5324f/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/habana_fp8.

Jul 17 '24 03:07 NeoZhangJianyu

thank you! I will experiment with it tomorrow

Jul 17 '24 18:07 sleepingcat4

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Oct 29 '25 22:10 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

Nov 06 '25 22:11 github-actions[bot]

neural-compressor neural-compressor copied to clipboard

Any example to quantise a text embedding model on Intel Gaudi2?

neural-compressor
neural-compressor copied to clipboard