neural-compressor How to parallelize a model with fake quantization nodes?

How to parallelize a model with fake quantization nodes?

Open sheegao opened this issue 1 year ago • 5 comments

If a quantized model contains fake quantization nodes, how can such a model be parallelized, and how can its accuracy be validated on a dataset?

Dec 08 '23 13:12 sheegao

Hi @sheegao , which framework do you use? Does 'fake quantization nodes' mean quant-dequant pairs?

Dec 13 '23 08:12 mengniwang95

Hi @sheegao , which framework do you use? Does 'fake quantization nodes' mean quant-dequant pairs?

Yes,it's mean quant-dequant pairs,but I have not found any existing distributed frameworks that support this type of model.

Dec 14 '23 03:12 sheegao

Hi @sheegao , you can directly deploy a fake quantized torch model with torch DistributedDataParallel API. I don't think this will be a problem.

Dec 14 '23 07:12 xin3he

Hi @sheegao , you can directly deploy a fake quantized torch model with torch DistributedDataParallel API. I don't think this will be a problem.

yeah，But I'm interested in pipeline parallelism or tensor parallelism, rather than data parallelism

Dec 14 '23 07:12 sheegao

I think this question should be raised to packages who provide pipeline parallelism or tensor parallelism.

Dec 15 '23 02:12 xin3he

No further plan or discussion in this thread, close it for now

Apr 26 '24 06:04 thuang6